From patchwork Thu Nov 28 15:12:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13888139 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 81DDD1B85D1 for ; Thu, 28 Nov 2024 15:13:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806785; cv=none; b=Cbr7HRM7o26mGLnewaydtGpBJ/bfr397pDQSW/zMTIrC63agjFdH0rrpVnLR463SJAQSoh7plM/l1ZHnh3g9rbdURmSJMWA0m+Ijm1ppjt0yyb5ewpNka4TqUUt6VTqTlSEfRGMb4m6QNM3s2JozdYKX5cWfn4ODJsxsPV0akC4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806785; c=relaxed/simple; bh=dbMuuKJJU1bCqHlaL0UVjvAN7kojyjUkT2IObE8rFq8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gOm3SA4K6YTLocyHE273GZ5CJ71kOFTpWrS48zs31xzUD2fq/ZVpRtQUWuwAlKEqY1DoDOUuepPfwDFfq6FFqrEAK4U5ClYBmkUH9SrhPv8yA0Qtum4QNCPObKN6x5bbLdpGwS2r7sJ8rfgf5ZZZ59yWwjXxKuSbcjlxgpSmWvI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A32211476; Thu, 28 Nov 2024 07:13:31 -0800 (PST) Received: from localhost.localdomain (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 001953F66E; Thu, 28 Nov 2024 07:12:59 -0800 (PST) From: Alexandru Elisei To: will@kernel.org, julien.thierry.kdev@gmail.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Cc: maz@kernel.org, oliver.upton@linux.dev, apatel@ventanamicro.com, andre.przywara@arm.com, suzuki.poulose@arm.com, s.abdollahi22@imperial.ac.uk Subject: [PATCH kvmtool 1/4] arm: Fix off-by-one errors when computing payload memory layout Date: Thu, 28 Nov 2024 15:12:43 +0000 Message-ID: <20241128151246.10858-2-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241128151246.10858-1-alexandru.elisei@arm.com> References: <20241128151246.10858-1-alexandru.elisei@arm.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In kvm__arch_load_kernel_image(), 'limit' is computed to be the topmost byte address where the payload can reside. In all the read_file() calls, the maximum size of the file being read is computed as limit - pos, which is incorrect: either limit is inclusive, and it should be limit - pos + 1, or the maximum size is correct and limit is incorrectly computed as inclusive. After reserving space for the DTB, 'limit' is updated to point at the first byte of the DTB. Which is in contradiction with the way it is initially calculated, because in theory this makes it possible for the initrd (which is copied below the DTB) to overwrite the first byte of the DTB. That's only avoided by accident, and not by design, because, as explained above, the size of the initrd is smaller by 1 byte (read_file() has the size parameter limit - pos, instead of limit - pos + 1). Let's get rid of this confusion and compute 'limit' as exclusive from the start. Signed-off-by: Alexandru Elisei --- arm/kvm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arm/kvm.c b/arm/kvm.c index 9f9582326401..da0430c40c36 100644 --- a/arm/kvm.c +++ b/arm/kvm.c @@ -109,7 +109,7 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, * Linux requires the initrd and dtb to be mapped inside lowmem, * so we can't just place them at the top of memory. */ - limit = kvm->ram_start + min(kvm->ram_size, (u64)SZ_256M) - 1; + limit = kvm->ram_start + min(kvm->ram_size, (u64)SZ_256M); pos = kvm->ram_start + kvm__arch_get_kern_offset(kvm, fd_kernel); kvm->arch.kern_guest_start = host_to_guest_flat(kvm, pos); @@ -139,7 +139,7 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, kvm->arch.dtb_guest_start = guest_addr; pr_debug("Placing fdt at 0x%llx - 0x%llx", kvm->arch.dtb_guest_start, - host_to_guest_flat(kvm, limit)); + host_to_guest_flat(kvm, limit - 1)); limit = pos; /* ... and finally the initrd, if we have one. */ From patchwork Thu Nov 28 15:12:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13888140 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8AB0F1BBBFC for ; Thu, 28 Nov 2024 15:13:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806787; cv=none; b=Ps74N/Epv0zgIaba7TEJT35dhKL39Zbp9wW59fIuovIuFTlxQlC9b6dX3iigYCGWYqyPeEM+tXN1CzI+k29csUVrvcs39N3wsC4Usv6mXsK+nJW5Xec5LnMRrowt1S4o8gasJgUXo6zizfxjRDfMHrQDyqojKdhgddZg72ptCGI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806787; c=relaxed/simple; bh=PN6kHRyzwvl4QmPGF+ycOvYNOkxzWvmrpf90A76Bup0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GziKskTt2t6U1jHd0O8WF13wTkvFjRAHdtWlbwnGyCV7OnObvb3w0yKCXbgXwSChsLaYsGMKpwQrmpfdKr0kuT7kfSzVq33KIaadtzj9L1/IGa3ZyRajVy3LopIerpLBnqDxUQypxqRvyfeBg/rna2ShOt1gNtdNgXDKb6Bydv8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CF0201474; Thu, 28 Nov 2024 07:13:33 -0800 (PST) Received: from localhost.localdomain (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2EC743F66E; Thu, 28 Nov 2024 07:13:02 -0800 (PST) From: Alexandru Elisei To: will@kernel.org, julien.thierry.kdev@gmail.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Cc: maz@kernel.org, oliver.upton@linux.dev, apatel@ventanamicro.com, andre.przywara@arm.com, suzuki.poulose@arm.com, s.abdollahi22@imperial.ac.uk Subject: [PATCH kvmtool 2/4] arm: Check return value for host_to_guest_flat() Date: Thu, 28 Nov 2024 15:12:44 +0000 Message-ID: <20241128151246.10858-3-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241128151246.10858-1-alexandru.elisei@arm.com> References: <20241128151246.10858-1-alexandru.elisei@arm.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 kvmtool, on arm and arm64, puts the kernel, DTB and initrd (if present) in a 256MB memory region that starts at the bottom of RAM. kvm__arch_load_kernel_image() copies the kernel at the start of RAM, the DTB is placed at the top of the region, and immediately below it the initrd. When the initrd is specified by the user, kvmtool checks that it doesn't overlap with the kernel by computing the start address in the host's address space: fstat(fd_initrd, &sb); pos = pos - (sb.st_size + INITRD_ALIGN); guest_addr = ALIGN(host_to_guest_flat(kvm, pos), INITRD_ALIGN); (a) pos = guest_flat_to_host(kvm, guest_addr); (b) If the initrd is large enough to completely overwrite the kernel and start below the guest RAM (pos < kvm->ram_start), then kvmtool will omit the following errors: Warning: unable to translate host address 0xfffe849ffffc to guest (1) Warning: unable to translate guest address 0x0 to host (2) Fatal: initrd overlaps with kernel image. (3) (1) is because (a) calls host_to_guest_flat(kvm, pos) with a 'pos' outside any of the memslots. (2) is because guest_flat_to_host() is called at (b) with guest_addr=0, which is what host_to_guest_flat() returns if the supplied address is not found in any of the memslots. This warning is eliminated by this patch. And finally, (3) is the most useful message, because it tells the user what the error is. The issue is a more general pattern in kvm__arch_load_kernel_image(): kvmtool doesn't check if host_to_guest_flat() returns 0, which means that the host address is not within any of the memslots. Add a check for that, which will at the very least remove the second warning. This also fixes the following edge cases: 1. The same warnings being emitted in a similar scenario with the DTB, when the RAM is smaller than FDT_MAX_SIZE + FDT_ALIGN. 2. When copying the kernel, if the RAM size is smaller than the kernel offset, the start of the kernel (represented by the variable 'pos') will be outside the VA space allocated for the guest RAM. limit - pos will wrap around, because gcc 14.1.1 wraps the pointers (void pointer arithmetic is undefined in C99). Then read_file()->..->read() will return -EFAULT because the destination address is unallocated (as per man 2 read, also reproduced during testing). Signed-off-by: Alexandru Elisei --- arm/kvm.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arm/kvm.c b/arm/kvm.c index da0430c40c36..4beae69e1fb3 100644 --- a/arm/kvm.c +++ b/arm/kvm.c @@ -113,6 +113,8 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, pos = kvm->ram_start + kvm__arch_get_kern_offset(kvm, fd_kernel); kvm->arch.kern_guest_start = host_to_guest_flat(kvm, pos); + if (!kvm->arch.kern_guest_start) + die("guest memory too small to contain the kernel"); file_size = read_file(fd_kernel, pos, limit - pos); if (file_size < 0) { if (errno == ENOMEM) @@ -131,7 +133,10 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, */ pos = limit; pos -= (FDT_MAX_SIZE + FDT_ALIGN); - guest_addr = ALIGN(host_to_guest_flat(kvm, pos), FDT_ALIGN); + guest_addr = host_to_guest_flat(kvm, pos); + if (!guest_addr) + die("fdt too big to contain in guest memory"); + guest_addr = ALIGN(guest_addr, FDT_ALIGN); pos = guest_flat_to_host(kvm, guest_addr); if (pos < kernel_end) die("fdt overlaps with kernel image."); @@ -151,7 +156,10 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, die_perror("fstat"); pos -= (sb.st_size + INITRD_ALIGN); - guest_addr = ALIGN(host_to_guest_flat(kvm, pos), INITRD_ALIGN); + guest_addr = host_to_guest_flat(kvm, pos); + if (!guest_addr) + die("initrd too big to fit in the payload memory region"); + guest_addr = ALIGN(guest_addr, INITRD_ALIGN); pos = guest_flat_to_host(kvm, guest_addr); if (pos < kernel_end) die("initrd overlaps with kernel image."); From patchwork Thu Nov 28 15:12:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13888141 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 855F31B5ED1 for ; Thu, 28 Nov 2024 15:13:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806788; cv=none; b=cLQwVluBfn04GwnBSSDMTc714pKR1Q2A6hW5NLagcO8Qt2EPKKybLj1dccJtwFCUSqzCq/6x/oebQlzcO4AwwDtlKCaQiUju3EmsLx6PxH+plpM+3AKxyUGLtpEK6ZTW4Y0G0+q/cQtoNVIu28jO06my/cigIR9MilUGxh87bSc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806788; c=relaxed/simple; bh=KGJhzBu6JUIX+vaqDEP+fXFw+NzHwA8Rdy2DqOGFkxg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A1phy9SymcxroyVkzk93UJNBSXGw5g7kxXyFVBu0hxJAr3YW5f5YhahUhmqDM7Ggt8J6Xsar/b5iggrszIkqYKkcnvTGhFW/Tclql3s+X0O3qWMA0OC/0zLZ8oZzEnjCSXFEUkWfbim4UFwlfz2ggQbeQekR/7p3nSN4Ie2YI7E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0405F1476; Thu, 28 Nov 2024 07:13:36 -0800 (PST) Received: from localhost.localdomain (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 583663F66E; Thu, 28 Nov 2024 07:13:04 -0800 (PST) From: Alexandru Elisei To: will@kernel.org, julien.thierry.kdev@gmail.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Cc: maz@kernel.org, oliver.upton@linux.dev, apatel@ventanamicro.com, andre.przywara@arm.com, suzuki.poulose@arm.com, s.abdollahi22@imperial.ac.uk Subject: [PATCH kvmtool 3/4] arm64: Use the kernel header image_size when loading into memory Date: Thu, 28 Nov 2024 15:12:45 +0000 Message-ID: <20241128151246.10858-4-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241128151246.10858-1-alexandru.elisei@arm.com> References: <20241128151246.10858-1-alexandru.elisei@arm.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The field 'image_size' from the kernel header encodes the kernel size when loaded in memory. This includes the BSS section which gets zeroed early during boot (for example, in early_map_kernel() in Linux v6.12), section which is not reflected in the file size. kvmtool, after loading the kernel image into memory, uses the file size, not the image size, to compute the end of the kernel to check for overlaps. As a result, kvmtool doesn't detect when the DTB or initrd overlap with the in memory kernel image as long as they don't overlap with the file, and this leads to Linux silently overwriting the DTB or the initrd with zeroes during boot. This kind of issue, when it happens, is not trivial to debug. kvmtool already reads the image header to get the kernel offset, so expand on that to also read the image size, and use it instead of the file size for memory layout calculations. Signed-off-by: Alexandru Elisei --- arm/aarch32/include/kvm/kvm-arch.h | 2 + arm/aarch64/include/kvm/kvm-arch.h | 5 +- arm/aarch64/kvm.c | 80 +++++++++++++++++++++++------- arm/kvm.c | 15 ++++-- 4 files changed, 78 insertions(+), 24 deletions(-) diff --git a/arm/aarch32/include/kvm/kvm-arch.h b/arm/aarch32/include/kvm/kvm-arch.h index 467fb09175b8..07d711e2f4c1 100644 --- a/arm/aarch32/include/kvm/kvm-arch.h +++ b/arm/aarch32/include/kvm/kvm-arch.h @@ -4,8 +4,10 @@ #include #define kvm__arch_get_kern_offset(...) 0x8000 +#define kvm__arch_get_kernel_size(...) 0 struct kvm; +static inline void kvm__arch_read_kernel_header(struct kvm *kvm, int fd) {} static inline void kvm__arch_enable_mte(struct kvm *kvm) {} #define MAX_PAGE_SIZE SZ_4K diff --git a/arm/aarch64/include/kvm/kvm-arch.h b/arm/aarch64/include/kvm/kvm-arch.h index 02d09a413831..97ab42485158 100644 --- a/arm/aarch64/include/kvm/kvm-arch.h +++ b/arm/aarch64/include/kvm/kvm-arch.h @@ -4,7 +4,10 @@ #include struct kvm; -unsigned long long kvm__arch_get_kern_offset(struct kvm *kvm, int fd); +void kvm__arch_read_kernel_header(struct kvm *kvm, int fd); +unsigned long long kvm__arch_get_kern_offset(struct kvm *kvm); +u64 kvm__arch_get_kernel_size(struct kvm *kvm); + int kvm__arch_get_ipa_limit(struct kvm *kvm); void kvm__arch_enable_mte(struct kvm *kvm); diff --git a/arm/aarch64/kvm.c b/arm/aarch64/kvm.c index 54200c9eec9d..6fcc828cbe01 100644 --- a/arm/aarch64/kvm.c +++ b/arm/aarch64/kvm.c @@ -8,6 +8,8 @@ #include +static struct arm64_image_header *kernel_header; + int vcpu_affinity_parser(const struct option *opt, const char *arg, int unset) { struct kvm *kvm = opt->ptr; @@ -57,50 +59,82 @@ u64 kvm__arch_default_ram_address(void) return ARM_MEMORY_AREA; } +void kvm__arch_read_kernel_header(struct kvm *kvm, int fd) +{ + const char *debug_str; + off_t cur_offset; + ssize_t size; + + if (kvm->cfg.arch.aarch32_guest) + return; + + kernel_header = malloc(sizeof(*kernel_header)); + if (!kernel_header) + return; + + cur_offset = lseek(fd, 0, SEEK_CUR); + if (cur_offset == (off_t)-1 || lseek(fd, 0, SEEK_SET) == (off_t)-1) { + debug_str = "Failed to seek in kernel image file"; + goto fail; + } + + size = xread(fd, kernel_header, sizeof(*kernel_header)); + if (size < 0 || (size_t)size < sizeof(*kernel_header)) + die("Failed to read kernel image header"); + + lseek(fd, cur_offset, SEEK_SET); + + if (memcmp(&kernel_header->magic, ARM64_IMAGE_MAGIC, sizeof(kernel_header->magic))) { + debug_str = "Kernel image magic not matching"; + kernel_header = NULL; + goto fail; + } + + return; + +fail: + pr_debug("%s, using defaults", debug_str); +} + /* * Return the TEXT_OFFSET value that the guest kernel expects. Note * that pre-3.17 kernels expose this value using the native endianness * instead of Little-Endian. BE kernels of this vintage may fail to * boot. See Documentation/arm64/booting.rst in your local kernel tree. */ -unsigned long long kvm__arch_get_kern_offset(struct kvm *kvm, int fd) +unsigned long long kvm__arch_get_kern_offset(struct kvm *kvm) { - struct arm64_image_header header; - off_t cur_offset; - ssize_t size; const char *debug_str; /* the 32bit kernel offset is a well known value */ if (kvm->cfg.arch.aarch32_guest) return 0x8000; - cur_offset = lseek(fd, 0, SEEK_CUR); - if (cur_offset == (off_t)-1 || - lseek(fd, 0, SEEK_SET) == (off_t)-1) { - debug_str = "Failed to seek in kernel image file"; + if (!kernel_header) { + debug_str = "Kernel header is missing"; goto default_offset; } - size = xread(fd, &header, sizeof(header)); - if (size < 0 || (size_t)size < sizeof(header)) - die("Failed to read kernel image header"); - - lseek(fd, cur_offset, SEEK_SET); - - if (memcmp(&header.magic, ARM64_IMAGE_MAGIC, sizeof(header.magic))) { - debug_str = "Kernel image magic not matching"; + if (!le64_to_cpu(kernel_header->image_size)) { + debug_str = "Image size is 0"; goto default_offset; } - if (le64_to_cpu(header.image_size)) - return le64_to_cpu(header.text_offset); + return le64_to_cpu(kernel_header->text_offset); - debug_str = "Image size is 0"; default_offset: pr_debug("%s, assuming TEXT_OFFSET to be 0x80000", debug_str); return 0x80000; } +u64 kvm__arch_get_kernel_size(struct kvm *kvm) +{ + if (kvm->cfg.arch.aarch32_guest || !kernel_header) + return 0; + + return le64_to_cpu(kernel_header->image_size); +} + int kvm__arch_get_ipa_limit(struct kvm *kvm) { int ret; @@ -160,3 +194,11 @@ void kvm__arch_enable_mte(struct kvm *kvm) pr_debug("MTE capability enabled"); } + +static int kvm__arch_free_kernel_header(struct kvm *kvm) +{ + free(kernel_header); + + return 0; +} +late_exit(kvm__arch_free_kernel_header); diff --git a/arm/kvm.c b/arm/kvm.c index 4beae69e1fb3..9013be489aff 100644 --- a/arm/kvm.c +++ b/arm/kvm.c @@ -104,6 +104,7 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, void *pos, *kernel_end, *limit; unsigned long guest_addr; ssize_t file_size; + u64 kernel_size; /* * Linux requires the initrd and dtb to be mapped inside lowmem, @@ -111,7 +112,9 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, */ limit = kvm->ram_start + min(kvm->ram_size, (u64)SZ_256M); - pos = kvm->ram_start + kvm__arch_get_kern_offset(kvm, fd_kernel); + kvm__arch_read_kernel_header(kvm, fd_kernel); + + pos = kvm->ram_start + kvm__arch_get_kern_offset(kvm); kvm->arch.kern_guest_start = host_to_guest_flat(kvm, pos); if (!kvm->arch.kern_guest_start) die("guest memory too small to contain the kernel"); @@ -122,9 +125,13 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, die_perror("kernel read"); } - kernel_end = pos + file_size; - pr_debug("Loaded kernel to 0x%llx (%zd bytes)", - kvm->arch.kern_guest_start, file_size); + + kernel_size = kvm__arch_get_kernel_size(kvm); + if (!kernel_size || kernel_size < (u64)file_size) + kernel_size = file_size; + kernel_end = pos + kernel_size; + pr_debug("Loaded kernel to 0x%llx (%llu bytes)", + kvm->arch.kern_guest_start, kernel_size); /* * Now load backwards from the end of memory so the kernel From patchwork Thu Nov 28 15:12:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexandru Elisei X-Patchwork-Id: 13888142 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E165B1BD9D0 for ; Thu, 28 Nov 2024 15:13:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806790; cv=none; b=L99tfHjR4BVP0mHHBeyGU0purzMcHdZLGjGI9AogwdaNnv6nnQho5GbX6+BGAZL2T2zKFyxWbJ2Uib7tYvgVhhmS99HAvgmu1Au+n+MNj/iIZQuMxYTFX1Pwdm01xXqiz0niJKVF9c5u0SsmdiirMISCTbBF6IFvf41eHMb7bKQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732806790; c=relaxed/simple; bh=CYN3mvUwBhRwdxWEt1klHIbvh/+u8xOtBoRxCuDzEZ4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OaYZNLTe5j2EUr4Gwffog1inh0T7OI9Au58xPSL8LS2nhDMI68q0o2+z//6gylczJrksuPk+PMrOotynD4KKS7CkaoyeJyfh0HH3JHlN5Xpbg3sb2xb5ZKtdcPoB+E2ocQWxtg7CKZ7iHErjJ59YxD1bpPzyiJEWavmgdTPyHhE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 397BD1474; Thu, 28 Nov 2024 07:13:38 -0800 (PST) Received: from localhost.localdomain (usa-sjc-mx-foss1.foss.arm.com [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 88DF13F66E; Thu, 28 Nov 2024 07:13:06 -0800 (PST) From: Alexandru Elisei To: will@kernel.org, julien.thierry.kdev@gmail.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev Cc: maz@kernel.org, oliver.upton@linux.dev, apatel@ventanamicro.com, andre.przywara@arm.com, suzuki.poulose@arm.com, s.abdollahi22@imperial.ac.uk Subject: [RFC PATCH kvmtool 4/4] arm64: Increase the payload memory region size to 512MB Date: Thu, 28 Nov 2024 15:12:46 +0000 Message-ID: <20241128151246.10858-5-alexandru.elisei@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241128151246.10858-1-alexandru.elisei@arm.com> References: <20241128151246.10858-1-alexandru.elisei@arm.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 kvmtool uses the same memory map for 64bit and 32bit guests, where it copies the kernel, the initrd and DTB in the bottom 256MB. The restriction on placing everything in the bottom 256MB comes from the aarch32 boot protocol, where the kernel, DTB and initrd must be placed in a region covered by the low-memory mapping. The size of the lowmem region varies based on the kernel-userspace split, which is a Kconfig option, and on the vmalloc area size, which can be specified by the user on the kernel command line. Hence kvmtool's choice of using the bottom 256MB as a reasonable compromise which has worked well so far. Sina has reported in private that they were unable to create a 64bit virtual machine with a 351MB initrd, and that's due to the 256MB restriction placed on the arm64 payload layout. This restriction is not found in the arm64 boot protocol: booting.rst in the Linux v6.12 source tree specifies that the kernel and initrd must be placed in the same 32GB window. There is also a mention of kernels prior to v4.2 requiring the DTB to be placed within a 512MB region starting at the kernel image minus the kernel offset. Increase the payload region size to 512MB for arm64, which will provide maximum compatibility with Linux guests, while allowing for larger initrds or kernel images. This means that the gap between the DTB (or initrd, if present) and the kernel is larger now. For 32 bit guests, the payload region size has been kept unchanged, because it has proven adequate so far. Reported-by: Abdollahi Sina Signed-off-by: Alexandru Elisei --- arm/aarch32/include/kvm/kvm-arch.h | 5 +++-- arm/aarch64/include/kvm/kvm-arch.h | 2 ++ arm/aarch64/kvm.c | 8 ++++++++ arm/kvm.c | 6 ++++-- 4 files changed, 17 insertions(+), 4 deletions(-) diff --git a/arm/aarch32/include/kvm/kvm-arch.h b/arm/aarch32/include/kvm/kvm-arch.h index 07d711e2f4c1..0333cf4355ac 100644 --- a/arm/aarch32/include/kvm/kvm-arch.h +++ b/arm/aarch32/include/kvm/kvm-arch.h @@ -3,8 +3,9 @@ #include -#define kvm__arch_get_kern_offset(...) 0x8000 -#define kvm__arch_get_kernel_size(...) 0 +#define kvm__arch_get_kern_offset(...) 0x8000 +#define kvm__arch_get_kernel_size(...) 0 +#define kvm__arch_get_payload_region_size(...) SZ_256M struct kvm; static inline void kvm__arch_read_kernel_header(struct kvm *kvm, int fd) {} diff --git a/arm/aarch64/include/kvm/kvm-arch.h b/arm/aarch64/include/kvm/kvm-arch.h index 97ab42485158..2d1a4ed8cea4 100644 --- a/arm/aarch64/include/kvm/kvm-arch.h +++ b/arm/aarch64/include/kvm/kvm-arch.h @@ -8,6 +8,8 @@ void kvm__arch_read_kernel_header(struct kvm *kvm, int fd); unsigned long long kvm__arch_get_kern_offset(struct kvm *kvm); u64 kvm__arch_get_kernel_size(struct kvm *kvm); +u64 kvm__arch_get_payload_region_size(struct kvm *kvm); + int kvm__arch_get_ipa_limit(struct kvm *kvm); void kvm__arch_enable_mte(struct kvm *kvm); diff --git a/arm/aarch64/kvm.c b/arm/aarch64/kvm.c index 6fcc828cbe01..98b24375ee98 100644 --- a/arm/aarch64/kvm.c +++ b/arm/aarch64/kvm.c @@ -135,6 +135,14 @@ u64 kvm__arch_get_kernel_size(struct kvm *kvm) return le64_to_cpu(kernel_header->image_size); } +u64 kvm__arch_get_payload_region_size(struct kvm *kvm) +{ + if (kvm->cfg.arch.aarch32_guest) + return SZ_256M; + + return SZ_512M; +} + int kvm__arch_get_ipa_limit(struct kvm *kvm) { int ret; diff --git a/arm/kvm.c b/arm/kvm.c index 9013be489aff..7b2b49e21498 100644 --- a/arm/kvm.c +++ b/arm/kvm.c @@ -103,14 +103,16 @@ bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd, { void *pos, *kernel_end, *limit; unsigned long guest_addr; + u64 payload_region_size; ssize_t file_size; u64 kernel_size; + payload_region_size = kvm__arch_get_payload_region_size(kvm); /* - * Linux requires the initrd and dtb to be mapped inside lowmem, + * Linux for arm requires the initrd and dtb to be mapped inside lowmem, * so we can't just place them at the top of memory. */ - limit = kvm->ram_start + min(kvm->ram_size, (u64)SZ_256M); + limit = kvm->ram_start + min(kvm->ram_size, payload_region_size); kvm__arch_read_kernel_header(kvm, fd_kernel);