diff mbox series

[PULL,10/56] x86: don't let decompressed kernel image clobber setup_data

Message ID 20230130201810.11518-11-mst@redhat.com (mailing list archive)
State New, archived
Headers show
Series [PULL,01/56] shpc: disallow unplug when power indicator is blinking | expand

Commit Message

Michael S. Tsirkin Jan. 30, 2023, 8:19 p.m. UTC
From: "Jason A. Donenfeld" <Jason@zx2c4.com>

The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

                                 d e c o m p r e s s e d   k e r n e l
                     |-------------------------------------------------------------|
                0x1000000                                                     0x1000000+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Fixup-by: Michael S. Tsirkin <mst@redhat.com>
Cc: x86@kernel.org
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
---
 include/hw/i386/microvm.h |  5 ++--
 include/hw/nvram/fw_cfg.h |  9 +++++++
 hw/i386/microvm.c         | 15 +++++++----
 hw/i386/x86.c             | 52 +++++++++++++++++++++------------------
 hw/nvram/fw_cfg.c         |  9 +++++++
 5 files changed, 59 insertions(+), 31 deletions(-)

Comments

Jason A. Donenfeld Jan. 31, 2023, 7:39 p.m. UTC | #1
On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote:
> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> 
> The setup_data links are appended to the compressed kernel image. Since
> the kernel image is typically loaded at 0x100000, setup_data lives at
> `0x100000 + compressed_size`, which does not get relocated during the
> kernel's boot process.
> 
> The kernel typically decompresses the image starting at address
> 0x1000000 (note: there's one more zero there than the compressed image
> above). This usually is fine for most kernels.
> 
> However, if the compressed image is actually quite large, then
> setup_data will live at a `0x100000 + compressed_size` that extends into
> the decompressed zone at 0x1000000. In other words, if compressed_size
> is larger than `0x1000000 - 0x100000`, then the decompression step will
> clobber setup_data, resulting in crashes.
> 
> Visually, what happens now is that QEMU appends setup_data to the kernel
> image:
> 
>           kernel image            setup_data
>    |--------------------------||----------------|
> 0x100000                  0x100000+l1     0x100000+l1+l2
> 
> The problem is that this decompresses to 0x1000000 (one more zero). So
> if l1 is > (0x1000000-0x100000), then this winds up looking like:
> 
>           kernel image            setup_data
>    |--------------------------||----------------|
> 0x100000                  0x100000+l1     0x100000+l1+l2
> 
>                                  d e c o m p r e s s e d   k e r n e l
>                      |-------------------------------------------------------------|
>                 0x1000000                                                     0x1000000+l3
> 
> The decompressed kernel seemingly overwriting the compressed kernel
> image isn't a problem, because that gets relocated to a higher address
> early on in the boot process, at the end of startup_64. setup_data,
> however, stays in the same place, since those links are self referential
> and nothing fixes them up.  So the decompressed kernel clobbers it.
> 
> Fix this by appending setup_data to the cmdline blob rather than the
> kernel image blob, which remains at a lower address that won't get
> clobbered.
> 
> This could have been done by overwriting the initrd blob instead, but
> that poses big difficulties, such as no longer being able to use memory
> mapped files for initrd, hurting performance, and, more importantly, the
> initrd address calculation is hard coded in qboot, and it always grows
> down rather than up, which means lots of brittle semantics would have to
> be changed around, incurring more complexity. In contrast, using cmdline
> is simple and doesn't interfere with anything.
> 
> The microvm machine has a gross hack where it fiddles with fw_cfg data
> after the fact. So this hack is updated to account for this appending,
> by reserving some bytes.
> 
> Fixup-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: x86@kernel.org
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
> Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Tested-by: Eric Biggers <ebiggers@google.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>

This one should wind up in the stable point release too. Dunno what the
procedure for that is.

Jason
H. Peter Anvin Jan. 31, 2023, 8:54 p.m. UTC | #2
On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>
>The setup_data links are appended to the compressed kernel image. Since
>the kernel image is typically loaded at 0x100000, setup_data lives at
>`0x100000 + compressed_size`, which does not get relocated during the
>kernel's boot process.
>
>The kernel typically decompresses the image starting at address
>0x1000000 (note: there's one more zero there than the compressed image
>above). This usually is fine for most kernels.
>
>However, if the compressed image is actually quite large, then
>setup_data will live at a `0x100000 + compressed_size` that extends into
>the decompressed zone at 0x1000000. In other words, if compressed_size
>is larger than `0x1000000 - 0x100000`, then the decompression step will
>clobber setup_data, resulting in crashes.
>
>Visually, what happens now is that QEMU appends setup_data to the kernel
>image:
>
>          kernel image            setup_data
>   |--------------------------||----------------|
>0x100000                  0x100000+l1     0x100000+l1+l2
>
>The problem is that this decompresses to 0x1000000 (one more zero). So
>if l1 is > (0x1000000-0x100000), then this winds up looking like:
>
>          kernel image            setup_data
>   |--------------------------||----------------|
>0x100000                  0x100000+l1     0x100000+l1+l2
>
>                                 d e c o m p r e s s e d   k e r n e l
>                     |-------------------------------------------------------------|
>                0x1000000                                                     0x1000000+l3
>
>The decompressed kernel seemingly overwriting the compressed kernel
>image isn't a problem, because that gets relocated to a higher address
>early on in the boot process, at the end of startup_64. setup_data,
>however, stays in the same place, since those links are self referential
>and nothing fixes them up.  So the decompressed kernel clobbers it.
>
>Fix this by appending setup_data to the cmdline blob rather than the
>kernel image blob, which remains at a lower address that won't get
>clobbered.
>
>This could have been done by overwriting the initrd blob instead, but
>that poses big difficulties, such as no longer being able to use memory
>mapped files for initrd, hurting performance, and, more importantly, the
>initrd address calculation is hard coded in qboot, and it always grows
>down rather than up, which means lots of brittle semantics would have to
>be changed around, incurring more complexity. In contrast, using cmdline
>is simple and doesn't interfere with anything.
>
>The microvm machine has a gross hack where it fiddles with fw_cfg data
>after the fact. So this hack is updated to account for this appending,
>by reserving some bytes.
>
>Fixup-by: Michael S. Tsirkin <mst@redhat.com>
>Cc: x86@kernel.org
>Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
>Cc: H. Peter Anvin <hpa@zytor.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: Eric Biggers <ebiggers@kernel.org>
>Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
>Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
>Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>Tested-by: Eric Biggers <ebiggers@google.com>
>Tested-by: Mathias Krause <minipli@grsecurity.net>
>---
> include/hw/i386/microvm.h |  5 ++--
> include/hw/nvram/fw_cfg.h |  9 +++++++
> hw/i386/microvm.c         | 15 +++++++----
> hw/i386/x86.c             | 52 +++++++++++++++++++++------------------
> hw/nvram/fw_cfg.c         |  9 +++++++
> 5 files changed, 59 insertions(+), 31 deletions(-)
>
>diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>index fad97a891d..e8af61f194 100644
>--- a/include/hw/i386/microvm.h
>+++ b/include/hw/i386/microvm.h
>@@ -50,8 +50,9 @@
>  */
> 
> /* Platform virtio definitions */
>-#define VIRTIO_MMIO_BASE      0xfeb00000
>-#define VIRTIO_CMDLINE_MAXLEN 64
>+#define VIRTIO_MMIO_BASE                0xfeb00000
>+#define VIRTIO_CMDLINE_MAXLEN           64
>+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN    ((VIRTIO_CMDLINE_MAXLEN + 1) * 16)
> 
> #define GED_MMIO_BASE         0xfea00000
> #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
>diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
>index 2e503904dc..990dcdbb2e 100644
>--- a/include/hw/nvram/fw_cfg.h
>+++ b/include/hw/nvram/fw_cfg.h
>@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t key,
>                                void *data, size_t len,
>                                bool read_only);
> 
>+/**
>+ * fw_cfg_read_bytes_ptr:
>+ * @s: fw_cfg device being modified
>+ * @key: selector key value for new fw_cfg item
>+ *
>+ * Reads an existing fw_cfg data pointer.
>+ */
>+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
>+
> /**
>  * fw_cfg_add_string:
>  * @s: fw_cfg device being modified
>diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>index 170a331e3f..29f30dd6d3 100644
>--- a/hw/i386/microvm.c
>+++ b/hw/i386/microvm.c
>@@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
>     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>     BusState *bus;
>     BusChild *kid;
>-    char *cmdline;
>+    char *cmdline, *existing_cmdline;
>+    size_t len;
> 
>     /*
>      * Find MMIO transports with attached devices, and add them to the kernel
>@@ -387,7 +388,8 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
>      * Yes, this is a hack, but one that heavily improves the UX without
>      * introducing any significant issues.
>      */
>-    cmdline = g_strdup(machine->kernel_cmdline);
>+    existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA);
>+    cmdline = g_strdup(existing_cmdline);
>     bus = sysbus_get_default();
>     QTAILQ_FOREACH(kid, &bus->children, sibling) {
>         DeviceState *dev = kid->child;
>@@ -411,9 +413,12 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
>         }
>     }
> 
>-    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
>-    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>-
>+    len = strlen(cmdline);
>+    if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline)) {
>+        fprintf(stderr, "qemu: virtio mmio cmdline too large, skipping\n");
>+    } else {
>+        memcpy(existing_cmdline, cmdline, len + 1);
>+    }
>     g_free(cmdline);
> }
> 
>diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>index 78cc131926..eaff4227bd 100644
>--- a/hw/i386/x86.c
>+++ b/hw/i386/x86.c
>@@ -50,6 +50,7 @@
> #include "hw/intc/i8259.h"
> #include "hw/rtc/mc146818rtc.h"
> #include "target/i386/sev.h"
>+#include "hw/i386/microvm.h"
> 
> #include "hw/acpi/cpu_hotplug.h"
> #include "hw/irq.h"
>@@ -813,12 +814,18 @@ void x86_load_linux(X86MachineState *x86ms,
>     const char *kernel_filename = machine->kernel_filename;
>     const char *initrd_filename = machine->initrd_filename;
>     const char *dtb_filename = machine->dtb;
>-    const char *kernel_cmdline = machine->kernel_cmdline;
>+    char *kernel_cmdline;
>     SevKernelLoaderContext sev_load_ctx = {};
>     enum { RNG_SEED_LENGTH = 32 };
> 
>-    /* Align to 16 bytes as a paranoia measure */
>-    cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
>+    /*
>+     * Add the NUL terminator, some padding for the microvm cmdline fiddling
>+     * hack, and then align to 16 bytes as a paranoia measure
>+     */
>+    cmdline_size = (strlen(machine->kernel_cmdline) + 1 +
>+                    VIRTIO_CMDLINE_TOTAL_MAX_LEN + 16) & ~15;
>+    /* Make a copy, since we might append arbitrary bytes to it later. */
>+    kernel_cmdline = g_strndup(machine->kernel_cmdline, cmdline_size);
> 
>     /* load the kernel header */
>     f = fopen(kernel_filename, "rb");
>@@ -959,12 +966,6 @@ void x86_load_linux(X86MachineState *x86ms,
>         initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
>     }
> 
>-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
>-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1);
>-    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>-    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
>-    sev_load_ctx.cmdline_size = strlen(kernel_cmdline) + 1;
>-
>     if (protocol >= 0x202) {
>         stl_p(header + 0x228, cmdline_addr);
>     } else {
>@@ -1091,27 +1092,24 @@ void x86_load_linux(X86MachineState *x86ms,
>             exit(1);
>         }
> 
>-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
>-        kernel_size = setup_data_offset + sizeof(SetupData) + dtb_size;
>-        kernel = g_realloc(kernel, kernel_size);
>-
>-
>-        setup_data = (SetupData *)(kernel + setup_data_offset);
>+        setup_data_offset = cmdline_size;
>+        cmdline_size += sizeof(SetupData) + dtb_size;
>+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
>+        setup_data = (void *)kernel_cmdline + setup_data_offset;
>         setup_data->next = cpu_to_le64(first_setup_data);
>-        first_setup_data = prot_addr + setup_data_offset;
>+        first_setup_data = cmdline_addr + setup_data_offset;
>         setup_data->type = cpu_to_le32(SETUP_DTB);
>         setup_data->len = cpu_to_le32(dtb_size);
>-
>         load_image_size(dtb_filename, setup_data->data, dtb_size);
>     }
> 
>-    if (!legacy_no_rng_seed) {
>-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
>-        kernel_size = setup_data_offset + sizeof(SetupData) + RNG_SEED_LENGTH;
>-        kernel = g_realloc(kernel, kernel_size);
>-        setup_data = (SetupData *)(kernel + setup_data_offset);
>+    if (!legacy_no_rng_seed && protocol >= 0x209) {
>+        setup_data_offset = cmdline_size;
>+        cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
>+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
>+        setup_data = (void *)kernel_cmdline + setup_data_offset;
>         setup_data->next = cpu_to_le64(first_setup_data);
>-        first_setup_data = prot_addr + setup_data_offset;
>+        first_setup_data = cmdline_addr + setup_data_offset;
>         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
>         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
>         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
>@@ -1122,6 +1120,12 @@ void x86_load_linux(X86MachineState *x86ms,
>         fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
>     }
> 
>+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
>+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size);
>+    fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline, cmdline_size);
>+    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
>+    sev_load_ctx.cmdline_size = cmdline_size;
>+
>     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
>     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
>     sev_load_ctx.kernel_data = (char *)kernel;
>@@ -1134,7 +1138,7 @@ void x86_load_linux(X86MachineState *x86ms,
>      * kernel on the other side of the fw_cfg interface matches the hash of the
>      * file the user passed in.
>      */
>-    if (!sev_enabled()) {
>+    if (!sev_enabled() && first_setup_data) {
>         SetupDataFixup *fixup = g_malloc(sizeof(*fixup));
> 
>         memcpy(setup, header, MIN(sizeof(header), setup_size));
>diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
>index a00881bc64..432754eda4 100644
>--- a/hw/nvram/fw_cfg.c
>+++ b/hw/nvram/fw_cfg.c
>@@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len)
>     fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
> }
> 
>+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
>+{
>+    int arch = !!(key & FW_CFG_ARCH_LOCAL);
>+
>+    key &= FW_CFG_ENTRY_MASK;
>+    assert(key < fw_cfg_max_entry(s));
>+    return s->entries[arch][key].data;
>+}
>+
> void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
> {
>     size_t sz = strlen(value) + 1;

Saying they are "appended to" is wrong; the loader is free to put them anywhere in usable RAM that is not covered by the kernel image, the kernel keepout area, the command line or initrd.
Jason A. Donenfeld Jan. 31, 2023, 9:22 p.m. UTC | #3
On Tue, Jan 31, 2023, 15:55 H. Peter Anvin <hpa@zytor.com> wrote:

> On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" <mst@redhat.com>
> wrote:
> >From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> >
> >The setup_data links are appended to the compressed kernel image. Since
> >the kernel image is typically loaded at 0x100000, setup_data lives at
> >`0x100000 + compressed_size`, which does not get relocated during the
> >kernel's boot process.
> >
> >The kernel typically decompresses the image starting at address
> >0x1000000 (note: there's one more zero there than the compressed image
> >above). This usually is fine for most kernels.
> >
> >However, if the compressed image is actually quite large, then
> >setup_data will live at a `0x100000 + compressed_size` that extends into
> >the decompressed zone at 0x1000000. In other words, if compressed_size
> >is larger than `0x1000000 - 0x100000`, then the decompression step will
> >clobber setup_data, resulting in crashes.
> >
> >Visually, what happens now is that QEMU appends setup_data to the kernel
> >image:
> >
> >          kernel image            setup_data
> >   |--------------------------||----------------|
> >0x100000                  0x100000+l1     0x100000+l1+l2
> >
> >The problem is that this decompresses to 0x1000000 (one more zero). So
> >if l1 is > (0x1000000-0x100000), then this winds up looking like:
> >
> >          kernel image            setup_data
> >   |--------------------------||----------------|
> >0x100000                  0x100000+l1     0x100000+l1+l2
> >
> >                                 d e c o m p r e s s e d   k e r n e l
> >
>  |-------------------------------------------------------------|
> >                0x1000000
>      0x1000000+l3
> >
> >The decompressed kernel seemingly overwriting the compressed kernel
> >image isn't a problem, because that gets relocated to a higher address
> >early on in the boot process, at the end of startup_64. setup_data,
> >however, stays in the same place, since those links are self referential
> >and nothing fixes them up.  So the decompressed kernel clobbers it.
> >
> >Fix this by appending setup_data to the cmdline blob rather than the
> >kernel image blob, which remains at a lower address that won't get
> >clobbered.
> >
> >This could have been done by overwriting the initrd blob instead, but
> >that poses big difficulties, such as no longer being able to use memory
> >mapped files for initrd, hurting performance, and, more importantly, the
> >initrd address calculation is hard coded in qboot, and it always grows
> >down rather than up, which means lots of brittle semantics would have to
> >be changed around, incurring more complexity. In contrast, using cmdline
> >is simple and doesn't interfere with anything.
> >
> >The microvm machine has a gross hack where it fiddles with fw_cfg data
> >after the fact. So this hack is updated to account for this appending,
> >by reserving some bytes.
> >
> >Fixup-by: Michael S. Tsirkin <mst@redhat.com>
> >Cc: x86@kernel.org
> >Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> >Cc: H. Peter Anvin <hpa@zytor.com>
> >Cc: Borislav Petkov <bp@alien8.de>
> >Cc: Eric Biggers <ebiggers@kernel.org>
> >Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> >Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
> >Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
> >Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >Tested-by: Eric Biggers <ebiggers@google.com>
> >Tested-by: Mathias Krause <minipli@grsecurity.net>
> >---
> > include/hw/i386/microvm.h |  5 ++--
> > include/hw/nvram/fw_cfg.h |  9 +++++++
> > hw/i386/microvm.c         | 15 +++++++----
> > hw/i386/x86.c             | 52 +++++++++++++++++++++------------------
> > hw/nvram/fw_cfg.c         |  9 +++++++
> > 5 files changed, 59 insertions(+), 31 deletions(-)
> >
> >diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
> >index fad97a891d..e8af61f194 100644
> >--- a/include/hw/i386/microvm.h
> >+++ b/include/hw/i386/microvm.h
> >@@ -50,8 +50,9 @@
> >  */
> >
> > /* Platform virtio definitions */
> >-#define VIRTIO_MMIO_BASE      0xfeb00000
> >-#define VIRTIO_CMDLINE_MAXLEN 64
> >+#define VIRTIO_MMIO_BASE                0xfeb00000
> >+#define VIRTIO_CMDLINE_MAXLEN           64
> >+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN    ((VIRTIO_CMDLINE_MAXLEN + 1) *
> 16)
> >
> > #define GED_MMIO_BASE         0xfea00000
> > #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
> >diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> >index 2e503904dc..990dcdbb2e 100644
> >--- a/include/hw/nvram/fw_cfg.h
> >+++ b/include/hw/nvram/fw_cfg.h
> >@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s,
> uint16_t key,
> >                                void *data, size_t len,
> >                                bool read_only);
> >
> >+/**
> >+ * fw_cfg_read_bytes_ptr:
> >+ * @s: fw_cfg device being modified
> >+ * @key: selector key value for new fw_cfg item
> >+ *
> >+ * Reads an existing fw_cfg data pointer.
> >+ */
> >+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
> >+
> > /**
> >  * fw_cfg_add_string:
> >  * @s: fw_cfg device being modified
> >diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> >index 170a331e3f..29f30dd6d3 100644
> >--- a/hw/i386/microvm.c
> >+++ b/hw/i386/microvm.c
> >@@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState
> *machine)
> >     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
> >     BusState *bus;
> >     BusChild *kid;
> >-    char *cmdline;
> >+    char *cmdline, *existing_cmdline;
> >+    size_t len;
> >
> >     /*
> >      * Find MMIO transports with attached devices, and add them to the
> kernel
> >@@ -387,7 +388,8 @@ static void microvm_fix_kernel_cmdline(MachineState
> *machine)
> >      * Yes, this is a hack, but one that heavily improves the UX without
> >      * introducing any significant issues.
> >      */
> >-    cmdline = g_strdup(machine->kernel_cmdline);
> >+    existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg,
> FW_CFG_CMDLINE_DATA);
> >+    cmdline = g_strdup(existing_cmdline);
> >     bus = sysbus_get_default();
> >     QTAILQ_FOREACH(kid, &bus->children, sibling) {
> >         DeviceState *dev = kid->child;
> >@@ -411,9 +413,12 @@ static void microvm_fix_kernel_cmdline(MachineState
> *machine)
> >         }
> >     }
> >
> >-    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE,
> strlen(cmdline) + 1);
> >-    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
> >-
> >+    len = strlen(cmdline);
> >+    if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline)) {
> >+        fprintf(stderr, "qemu: virtio mmio cmdline too large,
> skipping\n");
> >+    } else {
> >+        memcpy(existing_cmdline, cmdline, len + 1);
> >+    }
> >     g_free(cmdline);
> > }
> >
> >diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> >index 78cc131926..eaff4227bd 100644
> >--- a/hw/i386/x86.c
> >+++ b/hw/i386/x86.c
> >@@ -50,6 +50,7 @@
> > #include "hw/intc/i8259.h"
> > #include "hw/rtc/mc146818rtc.h"
> > #include "target/i386/sev.h"
> >+#include "hw/i386/microvm.h"
> >
> > #include "hw/acpi/cpu_hotplug.h"
> > #include "hw/irq.h"
> >@@ -813,12 +814,18 @@ void x86_load_linux(X86MachineState *x86ms,
> >     const char *kernel_filename = machine->kernel_filename;
> >     const char *initrd_filename = machine->initrd_filename;
> >     const char *dtb_filename = machine->dtb;
> >-    const char *kernel_cmdline = machine->kernel_cmdline;
> >+    char *kernel_cmdline;
> >     SevKernelLoaderContext sev_load_ctx = {};
> >     enum { RNG_SEED_LENGTH = 32 };
> >
> >-    /* Align to 16 bytes as a paranoia measure */
> >-    cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
> >+    /*
> >+     * Add the NUL terminator, some padding for the microvm cmdline
> fiddling
> >+     * hack, and then align to 16 bytes as a paranoia measure
> >+     */
> >+    cmdline_size = (strlen(machine->kernel_cmdline) + 1 +
> >+                    VIRTIO_CMDLINE_TOTAL_MAX_LEN + 16) & ~15;
> >+    /* Make a copy, since we might append arbitrary bytes to it later. */
> >+    kernel_cmdline = g_strndup(machine->kernel_cmdline, cmdline_size);
> >
> >     /* load the kernel header */
> >     f = fopen(kernel_filename, "rb");
> >@@ -959,12 +966,6 @@ void x86_load_linux(X86MachineState *x86ms,
> >         initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
> >     }
> >
> >-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
> >-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) +
> 1);
> >-    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
> >-    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
> >-    sev_load_ctx.cmdline_size = strlen(kernel_cmdline) + 1;
> >-
> >     if (protocol >= 0x202) {
> >         stl_p(header + 0x228, cmdline_addr);
> >     } else {
> >@@ -1091,27 +1092,24 @@ void x86_load_linux(X86MachineState *x86ms,
> >             exit(1);
> >         }
> >
> >-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
> >-        kernel_size = setup_data_offset + sizeof(SetupData) + dtb_size;
> >-        kernel = g_realloc(kernel, kernel_size);
> >-
> >-
> >-        setup_data = (SetupData *)(kernel + setup_data_offset);
> >+        setup_data_offset = cmdline_size;
> >+        cmdline_size += sizeof(SetupData) + dtb_size;
> >+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
> >+        setup_data = (void *)kernel_cmdline + setup_data_offset;
> >         setup_data->next = cpu_to_le64(first_setup_data);
> >-        first_setup_data = prot_addr + setup_data_offset;
> >+        first_setup_data = cmdline_addr + setup_data_offset;
> >         setup_data->type = cpu_to_le32(SETUP_DTB);
> >         setup_data->len = cpu_to_le32(dtb_size);
> >-
> >         load_image_size(dtb_filename, setup_data->data, dtb_size);
> >     }
> >
> >-    if (!legacy_no_rng_seed) {
> >-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
> >-        kernel_size = setup_data_offset + sizeof(SetupData) +
> RNG_SEED_LENGTH;
> >-        kernel = g_realloc(kernel, kernel_size);
> >-        setup_data = (SetupData *)(kernel + setup_data_offset);
> >+    if (!legacy_no_rng_seed && protocol >= 0x209) {
> >+        setup_data_offset = cmdline_size;
> >+        cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
> >+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
> >+        setup_data = (void *)kernel_cmdline + setup_data_offset;
> >         setup_data->next = cpu_to_le64(first_setup_data);
> >-        first_setup_data = prot_addr + setup_data_offset;
> >+        first_setup_data = cmdline_addr + setup_data_offset;
> >         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
> >         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
> >         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
> >@@ -1122,6 +1120,12 @@ void x86_load_linux(X86MachineState *x86ms,
> >         fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel,
> kernel_size);
> >     }
> >
> >+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
> >+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size);
> >+    fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline,
> cmdline_size);
> >+    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
> >+    sev_load_ctx.cmdline_size = cmdline_size;
> >+
> >     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
> >     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
> >     sev_load_ctx.kernel_data = (char *)kernel;
> >@@ -1134,7 +1138,7 @@ void x86_load_linux(X86MachineState *x86ms,
> >      * kernel on the other side of the fw_cfg interface matches the hash
> of the
> >      * file the user passed in.
> >      */
> >-    if (!sev_enabled()) {
> >+    if (!sev_enabled() && first_setup_data) {
> >         SetupDataFixup *fixup = g_malloc(sizeof(*fixup));
> >
> >         memcpy(setup, header, MIN(sizeof(header), setup_size));
> >diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> >index a00881bc64..432754eda4 100644
> >--- a/hw/nvram/fw_cfg.c
> >+++ b/hw/nvram/fw_cfg.c
> >@@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key,
> void *data, size_t len)
> >     fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
> > }
> >
> >+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
> >+{
> >+    int arch = !!(key & FW_CFG_ARCH_LOCAL);
> >+
> >+    key &= FW_CFG_ENTRY_MASK;
> >+    assert(key < fw_cfg_max_entry(s));
> >+    return s->entries[arch][key].data;
> >+}
> >+
> > void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
> > {
> >     size_t sz = strlen(value) + 1;
>
> Saying they are "appended to" is wrong; the loader is free to put them
> anywhere in usable RAM that is not covered by the kernel image, the kernel
> keepout area, the command line or initrd.
>

No. That sentence describes what QEMU does currently, without this patch,
and it is accurate. This is not a kernel commit. It's a QEMU one.

(Also, this patch sat on the list for over a month, during which time you
could have chimed in. The ship for nitpicking has sailed now.)

Sorry for the HTML crud; I'll be mobile for another 10 days, without a real
computer.

Jason

>
Michael S. Tsirkin Jan. 31, 2023, 9:27 p.m. UTC | #4
On Tue, Jan 31, 2023 at 08:39:08PM +0100, Jason A. Donenfeld wrote:
> On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote:
> > From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > 
> > The setup_data links are appended to the compressed kernel image. Since
> > the kernel image is typically loaded at 0x100000, setup_data lives at
> > `0x100000 + compressed_size`, which does not get relocated during the
> > kernel's boot process.
> > 
> > The kernel typically decompresses the image starting at address
> > 0x1000000 (note: there's one more zero there than the compressed image
> > above). This usually is fine for most kernels.
> > 
> > However, if the compressed image is actually quite large, then
> > setup_data will live at a `0x100000 + compressed_size` that extends into
> > the decompressed zone at 0x1000000. In other words, if compressed_size
> > is larger than `0x1000000 - 0x100000`, then the decompression step will
> > clobber setup_data, resulting in crashes.
> > 
> > Visually, what happens now is that QEMU appends setup_data to the kernel
> > image:
> > 
> >           kernel image            setup_data
> >    |--------------------------||----------------|
> > 0x100000                  0x100000+l1     0x100000+l1+l2
> > 
> > The problem is that this decompresses to 0x1000000 (one more zero). So
> > if l1 is > (0x1000000-0x100000), then this winds up looking like:
> > 
> >           kernel image            setup_data
> >    |--------------------------||----------------|
> > 0x100000                  0x100000+l1     0x100000+l1+l2
> > 
> >                                  d e c o m p r e s s e d   k e r n e l
> >                      |-------------------------------------------------------------|
> >                 0x1000000                                                     0x1000000+l3
> > 
> > The decompressed kernel seemingly overwriting the compressed kernel
> > image isn't a problem, because that gets relocated to a higher address
> > early on in the boot process, at the end of startup_64. setup_data,
> > however, stays in the same place, since those links are self referential
> > and nothing fixes them up.  So the decompressed kernel clobbers it.
> > 
> > Fix this by appending setup_data to the cmdline blob rather than the
> > kernel image blob, which remains at a lower address that won't get
> > clobbered.
> > 
> > This could have been done by overwriting the initrd blob instead, but
> > that poses big difficulties, such as no longer being able to use memory
> > mapped files for initrd, hurting performance, and, more importantly, the
> > initrd address calculation is hard coded in qboot, and it always grows
> > down rather than up, which means lots of brittle semantics would have to
> > be changed around, incurring more complexity. In contrast, using cmdline
> > is simple and doesn't interfere with anything.
> > 
> > The microvm machine has a gross hack where it fiddles with fw_cfg data
> > after the fact. So this hack is updated to account for this appending,
> > by reserving some bytes.
> > 
> > Fixup-by: Michael S. Tsirkin <mst@redhat.com>
> > Cc: x86@kernel.org
> > Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Eric Biggers <ebiggers@kernel.org>
> > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
> > Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Tested-by: Eric Biggers <ebiggers@google.com>
> > Tested-by: Mathias Krause <minipli@grsecurity.net>
> 
> This one should wind up in the stable point release too. Dunno what the
> procedure for that is.
> 
> Jason

If you want that you need to include
    Cc: qemu-stable@nongnu.org
    Fixes: <hash> ("subject")

you can still reply to the original mail with this.
Jason A. Donenfeld Jan. 31, 2023, 11:32 p.m. UTC | #5
On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote:
> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> 
> The setup_data links are appended to the compressed kernel image. Since
> the kernel image is typically loaded at 0x100000, setup_data lives at
> `0x100000 + compressed_size`, which does not get relocated during the
> kernel's boot process.
> 
> The kernel typically decompresses the image starting at address
> 0x1000000 (note: there's one more zero there than the compressed image
> above). This usually is fine for most kernels.
> 
> However, if the compressed image is actually quite large, then
> setup_data will live at a `0x100000 + compressed_size` that extends into
> the decompressed zone at 0x1000000. In other words, if compressed_size
> is larger than `0x1000000 - 0x100000`, then the decompression step will
> clobber setup_data, resulting in crashes.
> 
> Visually, what happens now is that QEMU appends setup_data to the kernel
> image:
> 
>           kernel image            setup_data
>    |--------------------------||----------------|
> 0x100000                  0x100000+l1     0x100000+l1+l2
> 
> The problem is that this decompresses to 0x1000000 (one more zero). So
> if l1 is > (0x1000000-0x100000), then this winds up looking like:
> 
>           kernel image            setup_data
>    |--------------------------||----------------|
> 0x100000                  0x100000+l1     0x100000+l1+l2
> 
>                                  d e c o m p r e s s e d   k e r n e l
>                      |-------------------------------------------------------------|
>                 0x1000000                                                     0x1000000+l3
> 
> The decompressed kernel seemingly overwriting the compressed kernel
> image isn't a problem, because that gets relocated to a higher address
> early on in the boot process, at the end of startup_64. setup_data,
> however, stays in the same place, since those links are self referential
> and nothing fixes them up.  So the decompressed kernel clobbers it.
> 
> Fix this by appending setup_data to the cmdline blob rather than the
> kernel image blob, which remains at a lower address that won't get
> clobbered.
> 
> This could have been done by overwriting the initrd blob instead, but
> that poses big difficulties, such as no longer being able to use memory
> mapped files for initrd, hurting performance, and, more importantly, the
> initrd address calculation is hard coded in qboot, and it always grows
> down rather than up, which means lots of brittle semantics would have to
> be changed around, incurring more complexity. In contrast, using cmdline
> is simple and doesn't interfere with anything.
> 
> The microvm machine has a gross hack where it fiddles with fw_cfg data
> after the fact. So this hack is updated to account for this appending,
> by reserving some bytes.
> 
> Fixup-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: x86@kernel.org
> Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
> Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> Tested-by: Eric Biggers <ebiggers@google.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> ---
>  include/hw/i386/microvm.h |  5 ++--
>  include/hw/nvram/fw_cfg.h |  9 +++++++
>  hw/i386/microvm.c         | 15 +++++++----
>  hw/i386/x86.c             | 52 +++++++++++++++++++++------------------
>  hw/nvram/fw_cfg.c         |  9 +++++++
>  5 files changed, 59 insertions(+), 31 deletions(-)

Cc: qemu-stable@nongnu.org
Fixes: 67f7e426 ("hw/i386: pass RNG seed via setup_data entry")
H. Peter Anvin Feb. 1, 2023, 5:40 a.m. UTC | #6
On January 31, 2023 1:22:43 PM PST, "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
>On Tue, Jan 31, 2023, 15:55 H. Peter Anvin <hpa@zytor.com> wrote:
>
>> On January 30, 2023 12:19:14 PM PST, "Michael S. Tsirkin" <mst@redhat.com>
>> wrote:
>> >From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>> >
>> >The setup_data links are appended to the compressed kernel image. Since
>> >the kernel image is typically loaded at 0x100000, setup_data lives at
>> >`0x100000 + compressed_size`, which does not get relocated during the
>> >kernel's boot process.
>> >
>> >The kernel typically decompresses the image starting at address
>> >0x1000000 (note: there's one more zero there than the compressed image
>> >above). This usually is fine for most kernels.
>> >
>> >However, if the compressed image is actually quite large, then
>> >setup_data will live at a `0x100000 + compressed_size` that extends into
>> >the decompressed zone at 0x1000000. In other words, if compressed_size
>> >is larger than `0x1000000 - 0x100000`, then the decompression step will
>> >clobber setup_data, resulting in crashes.
>> >
>> >Visually, what happens now is that QEMU appends setup_data to the kernel
>> >image:
>> >
>> >          kernel image            setup_data
>> >   |--------------------------||----------------|
>> >0x100000                  0x100000+l1     0x100000+l1+l2
>> >
>> >The problem is that this decompresses to 0x1000000 (one more zero). So
>> >if l1 is > (0x1000000-0x100000), then this winds up looking like:
>> >
>> >          kernel image            setup_data
>> >   |--------------------------||----------------|
>> >0x100000                  0x100000+l1     0x100000+l1+l2
>> >
>> >                                 d e c o m p r e s s e d   k e r n e l
>> >
>>  |-------------------------------------------------------------|
>> >                0x1000000
>>      0x1000000+l3
>> >
>> >The decompressed kernel seemingly overwriting the compressed kernel
>> >image isn't a problem, because that gets relocated to a higher address
>> >early on in the boot process, at the end of startup_64. setup_data,
>> >however, stays in the same place, since those links are self referential
>> >and nothing fixes them up.  So the decompressed kernel clobbers it.
>> >
>> >Fix this by appending setup_data to the cmdline blob rather than the
>> >kernel image blob, which remains at a lower address that won't get
>> >clobbered.
>> >
>> >This could have been done by overwriting the initrd blob instead, but
>> >that poses big difficulties, such as no longer being able to use memory
>> >mapped files for initrd, hurting performance, and, more importantly, the
>> >initrd address calculation is hard coded in qboot, and it always grows
>> >down rather than up, which means lots of brittle semantics would have to
>> >be changed around, incurring more complexity. In contrast, using cmdline
>> >is simple and doesn't interfere with anything.
>> >
>> >The microvm machine has a gross hack where it fiddles with fw_cfg data
>> >after the fact. So this hack is updated to account for this appending,
>> >by reserving some bytes.
>> >
>> >Fixup-by: Michael S. Tsirkin <mst@redhat.com>
>> >Cc: x86@kernel.org
>> >Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
>> >Cc: H. Peter Anvin <hpa@zytor.com>
>> >Cc: Borislav Petkov <bp@alien8.de>
>> >Cc: Eric Biggers <ebiggers@kernel.org>
>> >Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>> >Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
>> >Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
>> >Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>> >Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>> >Tested-by: Eric Biggers <ebiggers@google.com>
>> >Tested-by: Mathias Krause <minipli@grsecurity.net>
>> >---
>> > include/hw/i386/microvm.h |  5 ++--
>> > include/hw/nvram/fw_cfg.h |  9 +++++++
>> > hw/i386/microvm.c         | 15 +++++++----
>> > hw/i386/x86.c             | 52 +++++++++++++++++++++------------------
>> > hw/nvram/fw_cfg.c         |  9 +++++++
>> > 5 files changed, 59 insertions(+), 31 deletions(-)
>> >
>> >diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
>> >index fad97a891d..e8af61f194 100644
>> >--- a/include/hw/i386/microvm.h
>> >+++ b/include/hw/i386/microvm.h
>> >@@ -50,8 +50,9 @@
>> >  */
>> >
>> > /* Platform virtio definitions */
>> >-#define VIRTIO_MMIO_BASE      0xfeb00000
>> >-#define VIRTIO_CMDLINE_MAXLEN 64
>> >+#define VIRTIO_MMIO_BASE                0xfeb00000
>> >+#define VIRTIO_CMDLINE_MAXLEN           64
>> >+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN    ((VIRTIO_CMDLINE_MAXLEN + 1) *
>> 16)
>> >
>> > #define GED_MMIO_BASE         0xfea00000
>> > #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
>> >diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
>> >index 2e503904dc..990dcdbb2e 100644
>> >--- a/include/hw/nvram/fw_cfg.h
>> >+++ b/include/hw/nvram/fw_cfg.h
>> >@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s,
>> uint16_t key,
>> >                                void *data, size_t len,
>> >                                bool read_only);
>> >
>> >+/**
>> >+ * fw_cfg_read_bytes_ptr:
>> >+ * @s: fw_cfg device being modified
>> >+ * @key: selector key value for new fw_cfg item
>> >+ *
>> >+ * Reads an existing fw_cfg data pointer.
>> >+ */
>> >+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
>> >+
>> > /**
>> >  * fw_cfg_add_string:
>> >  * @s: fw_cfg device being modified
>> >diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
>> >index 170a331e3f..29f30dd6d3 100644
>> >--- a/hw/i386/microvm.c
>> >+++ b/hw/i386/microvm.c
>> >@@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState
>> *machine)
>> >     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
>> >     BusState *bus;
>> >     BusChild *kid;
>> >-    char *cmdline;
>> >+    char *cmdline, *existing_cmdline;
>> >+    size_t len;
>> >
>> >     /*
>> >      * Find MMIO transports with attached devices, and add them to the
>> kernel
>> >@@ -387,7 +388,8 @@ static void microvm_fix_kernel_cmdline(MachineState
>> *machine)
>> >      * Yes, this is a hack, but one that heavily improves the UX without
>> >      * introducing any significant issues.
>> >      */
>> >-    cmdline = g_strdup(machine->kernel_cmdline);
>> >+    existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg,
>> FW_CFG_CMDLINE_DATA);
>> >+    cmdline = g_strdup(existing_cmdline);
>> >     bus = sysbus_get_default();
>> >     QTAILQ_FOREACH(kid, &bus->children, sibling) {
>> >         DeviceState *dev = kid->child;
>> >@@ -411,9 +413,12 @@ static void microvm_fix_kernel_cmdline(MachineState
>> *machine)
>> >         }
>> >     }
>> >
>> >-    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE,
>> strlen(cmdline) + 1);
>> >-    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
>> >-
>> >+    len = strlen(cmdline);
>> >+    if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline)) {
>> >+        fprintf(stderr, "qemu: virtio mmio cmdline too large,
>> skipping\n");
>> >+    } else {
>> >+        memcpy(existing_cmdline, cmdline, len + 1);
>> >+    }
>> >     g_free(cmdline);
>> > }
>> >
>> >diff --git a/hw/i386/x86.c b/hw/i386/x86.c
>> >index 78cc131926..eaff4227bd 100644
>> >--- a/hw/i386/x86.c
>> >+++ b/hw/i386/x86.c
>> >@@ -50,6 +50,7 @@
>> > #include "hw/intc/i8259.h"
>> > #include "hw/rtc/mc146818rtc.h"
>> > #include "target/i386/sev.h"
>> >+#include "hw/i386/microvm.h"
>> >
>> > #include "hw/acpi/cpu_hotplug.h"
>> > #include "hw/irq.h"
>> >@@ -813,12 +814,18 @@ void x86_load_linux(X86MachineState *x86ms,
>> >     const char *kernel_filename = machine->kernel_filename;
>> >     const char *initrd_filename = machine->initrd_filename;
>> >     const char *dtb_filename = machine->dtb;
>> >-    const char *kernel_cmdline = machine->kernel_cmdline;
>> >+    char *kernel_cmdline;
>> >     SevKernelLoaderContext sev_load_ctx = {};
>> >     enum { RNG_SEED_LENGTH = 32 };
>> >
>> >-    /* Align to 16 bytes as a paranoia measure */
>> >-    cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
>> >+    /*
>> >+     * Add the NUL terminator, some padding for the microvm cmdline
>> fiddling
>> >+     * hack, and then align to 16 bytes as a paranoia measure
>> >+     */
>> >+    cmdline_size = (strlen(machine->kernel_cmdline) + 1 +
>> >+                    VIRTIO_CMDLINE_TOTAL_MAX_LEN + 16) & ~15;
>> >+    /* Make a copy, since we might append arbitrary bytes to it later. */
>> >+    kernel_cmdline = g_strndup(machine->kernel_cmdline, cmdline_size);
>> >
>> >     /* load the kernel header */
>> >     f = fopen(kernel_filename, "rb");
>> >@@ -959,12 +966,6 @@ void x86_load_linux(X86MachineState *x86ms,
>> >         initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
>> >     }
>> >
>> >-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
>> >-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) +
>> 1);
>> >-    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
>> >-    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
>> >-    sev_load_ctx.cmdline_size = strlen(kernel_cmdline) + 1;
>> >-
>> >     if (protocol >= 0x202) {
>> >         stl_p(header + 0x228, cmdline_addr);
>> >     } else {
>> >@@ -1091,27 +1092,24 @@ void x86_load_linux(X86MachineState *x86ms,
>> >             exit(1);
>> >         }
>> >
>> >-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
>> >-        kernel_size = setup_data_offset + sizeof(SetupData) + dtb_size;
>> >-        kernel = g_realloc(kernel, kernel_size);
>> >-
>> >-
>> >-        setup_data = (SetupData *)(kernel + setup_data_offset);
>> >+        setup_data_offset = cmdline_size;
>> >+        cmdline_size += sizeof(SetupData) + dtb_size;
>> >+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
>> >+        setup_data = (void *)kernel_cmdline + setup_data_offset;
>> >         setup_data->next = cpu_to_le64(first_setup_data);
>> >-        first_setup_data = prot_addr + setup_data_offset;
>> >+        first_setup_data = cmdline_addr + setup_data_offset;
>> >         setup_data->type = cpu_to_le32(SETUP_DTB);
>> >         setup_data->len = cpu_to_le32(dtb_size);
>> >-
>> >         load_image_size(dtb_filename, setup_data->data, dtb_size);
>> >     }
>> >
>> >-    if (!legacy_no_rng_seed) {
>> >-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
>> >-        kernel_size = setup_data_offset + sizeof(SetupData) +
>> RNG_SEED_LENGTH;
>> >-        kernel = g_realloc(kernel, kernel_size);
>> >-        setup_data = (SetupData *)(kernel + setup_data_offset);
>> >+    if (!legacy_no_rng_seed && protocol >= 0x209) {
>> >+        setup_data_offset = cmdline_size;
>> >+        cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
>> >+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
>> >+        setup_data = (void *)kernel_cmdline + setup_data_offset;
>> >         setup_data->next = cpu_to_le64(first_setup_data);
>> >-        first_setup_data = prot_addr + setup_data_offset;
>> >+        first_setup_data = cmdline_addr + setup_data_offset;
>> >         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
>> >         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
>> >         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
>> >@@ -1122,6 +1120,12 @@ void x86_load_linux(X86MachineState *x86ms,
>> >         fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel,
>> kernel_size);
>> >     }
>> >
>> >+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
>> >+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size);
>> >+    fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline,
>> cmdline_size);
>> >+    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
>> >+    sev_load_ctx.cmdline_size = cmdline_size;
>> >+
>> >     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
>> >     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
>> >     sev_load_ctx.kernel_data = (char *)kernel;
>> >@@ -1134,7 +1138,7 @@ void x86_load_linux(X86MachineState *x86ms,
>> >      * kernel on the other side of the fw_cfg interface matches the hash
>> of the
>> >      * file the user passed in.
>> >      */
>> >-    if (!sev_enabled()) {
>> >+    if (!sev_enabled() && first_setup_data) {
>> >         SetupDataFixup *fixup = g_malloc(sizeof(*fixup));
>> >
>> >         memcpy(setup, header, MIN(sizeof(header), setup_size));
>> >diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
>> >index a00881bc64..432754eda4 100644
>> >--- a/hw/nvram/fw_cfg.c
>> >+++ b/hw/nvram/fw_cfg.c
>> >@@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key,
>> void *data, size_t len)
>> >     fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
>> > }
>> >
>> >+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
>> >+{
>> >+    int arch = !!(key & FW_CFG_ARCH_LOCAL);
>> >+
>> >+    key &= FW_CFG_ENTRY_MASK;
>> >+    assert(key < fw_cfg_max_entry(s));
>> >+    return s->entries[arch][key].data;
>> >+}
>> >+
>> > void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
>> > {
>> >     size_t sz = strlen(value) + 1;
>>
>> Saying they are "appended to" is wrong; the loader is free to put them
>> anywhere in usable RAM that is not covered by the kernel image, the kernel
>> keepout area, the command line or initrd.
>>
>
>No. That sentence describes what QEMU does currently, without this patch,
>and it is accurate. This is not a kernel commit. It's a QEMU one.
>
>(Also, this patch sat on the list for over a month, during which time you
>could have chimed in. The ship for nitpicking has sailed now.)
>
>Sorry for the HTML crud; I'll be mobile for another 10 days, without a real
>computer.
>
>Jason
>
>>

Oh, right. Sorry.
diff mbox series

Patch

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index fad97a891d..e8af61f194 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -50,8 +50,9 @@ 
  */
 
 /* Platform virtio definitions */
-#define VIRTIO_MMIO_BASE      0xfeb00000
-#define VIRTIO_CMDLINE_MAXLEN 64
+#define VIRTIO_MMIO_BASE                0xfeb00000
+#define VIRTIO_CMDLINE_MAXLEN           64
+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN    ((VIRTIO_CMDLINE_MAXLEN + 1) * 16)
 
 #define GED_MMIO_BASE         0xfea00000
 #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 2e503904dc..990dcdbb2e 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -139,6 +139,15 @@  void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t key,
                                void *data, size_t len,
                                bool read_only);
 
+/**
+ * fw_cfg_read_bytes_ptr:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ *
+ * Reads an existing fw_cfg data pointer.
+ */
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
+
 /**
  * fw_cfg_add_string:
  * @s: fw_cfg device being modified
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 170a331e3f..29f30dd6d3 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -378,7 +378,8 @@  static void microvm_fix_kernel_cmdline(MachineState *machine)
     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
     BusState *bus;
     BusChild *kid;
-    char *cmdline;
+    char *cmdline, *existing_cmdline;
+    size_t len;
 
     /*
      * Find MMIO transports with attached devices, and add them to the kernel
@@ -387,7 +388,8 @@  static void microvm_fix_kernel_cmdline(MachineState *machine)
      * Yes, this is a hack, but one that heavily improves the UX without
      * introducing any significant issues.
      */
-    cmdline = g_strdup(machine->kernel_cmdline);
+    existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA);
+    cmdline = g_strdup(existing_cmdline);
     bus = sysbus_get_default();
     QTAILQ_FOREACH(kid, &bus->children, sibling) {
         DeviceState *dev = kid->child;
@@ -411,9 +413,12 @@  static void microvm_fix_kernel_cmdline(MachineState *machine)
         }
     }
 
-    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
-    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
-
+    len = strlen(cmdline);
+    if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline)) {
+        fprintf(stderr, "qemu: virtio mmio cmdline too large, skipping\n");
+    } else {
+        memcpy(existing_cmdline, cmdline, len + 1);
+    }
     g_free(cmdline);
 }
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 78cc131926..eaff4227bd 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -50,6 +50,7 @@ 
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "hw/i386/microvm.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -813,12 +814,18 @@  void x86_load_linux(X86MachineState *x86ms,
     const char *kernel_filename = machine->kernel_filename;
     const char *initrd_filename = machine->initrd_filename;
     const char *dtb_filename = machine->dtb;
-    const char *kernel_cmdline = machine->kernel_cmdline;
+    char *kernel_cmdline;
     SevKernelLoaderContext sev_load_ctx = {};
     enum { RNG_SEED_LENGTH = 32 };
 
-    /* Align to 16 bytes as a paranoia measure */
-    cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
+    /*
+     * Add the NUL terminator, some padding for the microvm cmdline fiddling
+     * hack, and then align to 16 bytes as a paranoia measure
+     */
+    cmdline_size = (strlen(machine->kernel_cmdline) + 1 +
+                    VIRTIO_CMDLINE_TOTAL_MAX_LEN + 16) & ~15;
+    /* Make a copy, since we might append arbitrary bytes to it later. */
+    kernel_cmdline = g_strndup(machine->kernel_cmdline, cmdline_size);
 
     /* load the kernel header */
     f = fopen(kernel_filename, "rb");
@@ -959,12 +966,6 @@  void x86_load_linux(X86MachineState *x86ms,
         initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
     }
 
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1);
-    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
-    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
-    sev_load_ctx.cmdline_size = strlen(kernel_cmdline) + 1;
-
     if (protocol >= 0x202) {
         stl_p(header + 0x228, cmdline_addr);
     } else {
@@ -1091,27 +1092,24 @@  void x86_load_linux(X86MachineState *x86ms,
             exit(1);
         }
 
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(SetupData) + dtb_size;
-        kernel = g_realloc(kernel, kernel_size);
-
-
-        setup_data = (SetupData *)(kernel + setup_data_offset);
+        setup_data_offset = cmdline_size;
+        cmdline_size += sizeof(SetupData) + dtb_size;
+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
+        setup_data = (void *)kernel_cmdline + setup_data_offset;
         setup_data->next = cpu_to_le64(first_setup_data);
-        first_setup_data = prot_addr + setup_data_offset;
+        first_setup_data = cmdline_addr + setup_data_offset;
         setup_data->type = cpu_to_le32(SETUP_DTB);
         setup_data->len = cpu_to_le32(dtb_size);
-
         load_image_size(dtb_filename, setup_data->data, dtb_size);
     }
 
-    if (!legacy_no_rng_seed) {
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(SetupData) + RNG_SEED_LENGTH;
-        kernel = g_realloc(kernel, kernel_size);
-        setup_data = (SetupData *)(kernel + setup_data_offset);
+    if (!legacy_no_rng_seed && protocol >= 0x209) {
+        setup_data_offset = cmdline_size;
+        cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
+        setup_data = (void *)kernel_cmdline + setup_data_offset;
         setup_data->next = cpu_to_le64(first_setup_data);
-        first_setup_data = prot_addr + setup_data_offset;
+        first_setup_data = cmdline_addr + setup_data_offset;
         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
@@ -1122,6 +1120,12 @@  void x86_load_linux(X86MachineState *x86ms,
         fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
     }
 
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline, cmdline_size);
+    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
+    sev_load_ctx.cmdline_size = cmdline_size;
+
     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
     sev_load_ctx.kernel_data = (char *)kernel;
@@ -1134,7 +1138,7 @@  void x86_load_linux(X86MachineState *x86ms,
      * kernel on the other side of the fw_cfg interface matches the hash of the
      * file the user passed in.
      */
-    if (!sev_enabled()) {
+    if (!sev_enabled() && first_setup_data) {
         SetupDataFixup *fixup = g_malloc(sizeof(*fixup));
 
         memcpy(setup, header, MIN(sizeof(header), setup_size));
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index a00881bc64..432754eda4 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -741,6 +741,15 @@  void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len)
     fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
 }
 
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
+{
+    int arch = !!(key & FW_CFG_ARCH_LOCAL);
+
+    key &= FW_CFG_ENTRY_MASK;
+    assert(key < fw_cfg_max_entry(s));
+    return s->entries[arch][key].data;
+}
+
 void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
 {
     size_t sz = strlen(value) + 1;