Message ID | da39abab9785aea2a2e7652ed6403b6268aeb31f.camel@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86: fix q35 kernel measurements broken due to rng seeding | expand |
On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote: > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG seed > via setup_data entry") modifies the kernel image file to append a > random seed. Obviously this makes the hash of the kernel file > non-deterministic and so breaks both measured and some signed boots. I recall raising that at the time https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html and Jason pointed me to a followup which I tested and believe fixed it for SEV: https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html but it doesn't look like that second patch ever merged. We went through so many patches I think it probably got obsoleted by something else, and no one rechecked SEV again. > The commit notes it's only for non-EFI (because EFI has a different > RNG seeding mechanism) so, since there are no non-EFI q35 systems, this > should be disabled for the whole of the q35 machine type to bring back > deterministic kernel file hashes. SeaBIOS is the default firmware for both q35 and i440fx. The majority of systems using q35 will be non-EFI today, and that is what the random seed was intended to address. I don't think we can just disable this for the whole of q35. When you say it breaks measured / signed boots, I presume you are specifically referring to SEV kernel hashes measurements ? Or is there a more general problem to solve ? > Obviously this still leaves the legacy bios case broken for at least > measured boot, but I don't think anyone cares about that now. > > Signed-off-by: James Bottomley <jejb@linux.ibm.com> > --- > hw/i386/pc_q35.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index 83c57c6eb1..11e8dd7ca7 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -357,6 +357,7 @@ static void pc_q35_machine_options(MachineClass *m) > pcmc->default_nic_model = "e1000e"; > pcmc->pci_root_uid = 0; > pcmc->default_cpu_version = 1; > + pcmc->legacy_no_rng_seed = true; > > m->family = "pc_q35"; > m->desc = "Standard PC (Q35 + ICH9, 2009)"; > @@ -394,9 +395,7 @@ DEFINE_Q35_MACHINE(v7_2, "pc-q35-7.2", NULL, > > static void pc_q35_7_1_machine_options(MachineClass *m) > { > - PCMachineClass *pcmc = PC_MACHINE_CLASS(m); > pc_q35_7_2_machine_options(m); > - pcmc->legacy_no_rng_seed = true; > compat_props_add(m->compat_props, hw_compat_7_1, > hw_compat_7_1_len); > compat_props_add(m->compat_props, pc_compat_7_1, > pc_compat_7_1_len); > } This patch changes behaviour of the pc-q35-7.2 machine type. Any change will need to be in latest development 8.0 machine type only With regards, Daniel
On Wed, 2023-02-01 at 14:35 +0000, Daniel P. Berrangé wrote: > On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote: > > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG > > seed > > via setup_data entry") modifies the kernel image file to append a > > random seed. Obviously this makes the hash of the kernel file > > non-deterministic and so breaks both measured and some signed > > boots. > > I recall raising that at the time > > https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html > > and Jason pointed me to a followup which I tested and believe > fixed it for SEV: > > https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html > > but it doesn't look like that second patch ever merged. We went > through so many patches I think it probably got obsoleted by > something else, and no one rechecked SEV again. The kernel file problem is a pretty huge one. OVMF lays it down on an internal file system and without the second patch, it now contains random junk at the end. Anything that hashes the whole file (which includes not only the measured direct boot but also grub signatures and probably other bootloader signing mechanisms) will have an issue. > > The commit notes it's only for non-EFI (because EFI has a different > > RNG seeding mechanism) so, since there are no non-EFI q35 systems, > > this should be disabled for the whole of the q35 machine type to > > bring back deterministic kernel file hashes. > > SeaBIOS is the default firmware for both q35 and i440fx. The > majority of systems using q35 will be non-EFI today, and that > is what the random seed was intended to address. I don't think > we can just disable this for the whole of q35. > > When you say it breaks measured / signed boots, I presume you > are specifically referring to SEV kernel hashes measurements ? > Or is there a more general problem to solve ? No it generally breaks measured/signed boots because it adds random junk to the kernel file. The second patch will fix this if you apply it because setup data isn't measured or signed (yet ... however see the linux-coco debate about how it should be). I also note there was a v3 of the patch and considerable discussion saying it couldn't work: https://lore.kernel.org/qemu-devel/20220804230411.17720-1-Jason@zx2c4.com/ Which is likely why it never went in ... although the discussion does seem to resolve towards the end. James
This is already fixed via the patch that MST just sent in his pull. So wait a few days for that to be merged and it'll be all set. No need for this patch here. Do not merge. On Wed, Feb 1, 2023, 08:57 James Bottomley <jejb@linux.ibm.com> wrote: > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG seed > via setup_data entry") modifies the kernel image file to append a > random seed. Obviously this makes the hash of the kernel file > non-deterministic and so breaks both measured and some signed boots. > The commit notes it's only for non-EFI (because EFI has a different > RNG seeding mechanism) so, since there are no non-EFI q35 systems, this > should be disabled for the whole of the q35 machine type to bring back > deterministic kernel file hashes. > > Obviously this still leaves the legacy bios case broken for at least > measured boot, but I don't think anyone cares about that now. > > Signed-off-by: James Bottomley <jejb@linux.ibm.com> > --- > hw/i386/pc_q35.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c > index 83c57c6eb1..11e8dd7ca7 100644 > --- a/hw/i386/pc_q35.c > +++ b/hw/i386/pc_q35.c > @@ -357,6 +357,7 @@ static void pc_q35_machine_options(MachineClass *m) > pcmc->default_nic_model = "e1000e"; > pcmc->pci_root_uid = 0; > pcmc->default_cpu_version = 1; > + pcmc->legacy_no_rng_seed = true; > > m->family = "pc_q35"; > m->desc = "Standard PC (Q35 + ICH9, 2009)"; > @@ -394,9 +395,7 @@ DEFINE_Q35_MACHINE(v7_2, "pc-q35-7.2", NULL, > > static void pc_q35_7_1_machine_options(MachineClass *m) > { > - PCMachineClass *pcmc = PC_MACHINE_CLASS(m); > pc_q35_7_2_machine_options(m); > - pcmc->legacy_no_rng_seed = true; > compat_props_add(m->compat_props, hw_compat_7_1, > hw_compat_7_1_len); > compat_props_add(m->compat_props, pc_compat_7_1, > pc_compat_7_1_len); > } > -- > 2.35.3 > > >
This patch is not needed. It is already fixed in a pending pull. Do not merge. On Wed, Feb 1, 2023, 09:57 James Bottomley <jejb@linux.ibm.com> wrote: > On Wed, 2023-02-01 at 14:35 +0000, Daniel P. Berrangé wrote: > > On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote: > > > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG > > > seed > > > via setup_data entry") modifies the kernel image file to append a > > > random seed. Obviously this makes the hash of the kernel file > > > non-deterministic and so breaks both measured and some signed > > > boots. > > > > I recall raising that at the time > > > > https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html > > > > and Jason pointed me to a followup which I tested and believe > > fixed it for SEV: > > > > https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html > > > > but it doesn't look like that second patch ever merged. We went > > through so many patches I think it probably got obsoleted by > > something else, and no one rechecked SEV again. > > The kernel file problem is a pretty huge one. OVMF lays it down on an > internal file system and without the second patch, it now contains > random junk at the end. Anything that hashes the whole file (which > includes not only the measured direct boot but also grub signatures and > probably other bootloader signing mechanisms) will have an issue. > > > > The commit notes it's only for non-EFI (because EFI has a different > > > RNG seeding mechanism) so, since there are no non-EFI q35 systems, > > > this should be disabled for the whole of the q35 machine type to > > > bring back deterministic kernel file hashes. > > > > SeaBIOS is the default firmware for both q35 and i440fx. The > > majority of systems using q35 will be non-EFI today, and that > > is what the random seed was intended to address. I don't think > > we can just disable this for the whole of q35. > > > > When you say it breaks measured / signed boots, I presume you > > are specifically referring to SEV kernel hashes measurements ? > > Or is there a more general problem to solve ? > > No it generally breaks measured/signed boots because it adds random > junk to the kernel file. The second patch will fix this if you apply > it because setup data isn't measured or signed (yet ... however see the > linux-coco debate about how it should be). > > I also note there was a v3 of the patch and considerable discussion > saying it couldn't work: > > https://lore.kernel.org/qemu-devel/20220804230411.17720-1-Jason@zx2c4.com/ > > Which is likely why it never went in ... although the discussion does > seem to resolve towards the end. > > James > >
On Wed, Feb 01, 2023 at 09:56:35AM -0500, James Bottomley wrote: > On Wed, 2023-02-01 at 14:35 +0000, Daniel P. Berrangé wrote: > > On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote: > > > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG > > > seed > > > via setup_data entry") modifies the kernel image file to append a > > > random seed. Obviously this makes the hash of the kernel file > > > non-deterministic and so breaks both measured and some signed > > > boots. > > > > I recall raising that at the time > > > > https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html > > > > and Jason pointed me to a followup which I tested and believe > > fixed it for SEV: > > > > https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html > > > > but it doesn't look like that second patch ever merged. We went > > through so many patches I think it probably got obsoleted by > > something else, and no one rechecked SEV again. > > The kernel file problem is a pretty huge one. OVMF lays it down on an > internal file system and without the second patch, it now contains > random junk at the end. Anything that hashes the whole file (which > includes not only the measured direct boot but also grub signatures and > probably other bootloader signing mechanisms) will have an issue. > > > > The commit notes it's only for non-EFI (because EFI has a different > > > RNG seeding mechanism) so, since there are no non-EFI q35 systems, > > > this should be disabled for the whole of the q35 machine type to > > > bring back deterministic kernel file hashes. > > > > SeaBIOS is the default firmware for both q35 and i440fx. The > > majority of systems using q35 will be non-EFI today, and that > > is what the random seed was intended to address. I don't think > > we can just disable this for the whole of q35. > > > > When you say it breaks measured / signed boots, I presume you > > are specifically referring to SEV kernel hashes measurements ? > > Or is there a more general problem to solve ? > > No it generally breaks measured/signed boots because it adds random > junk to the kernel file. The second patch will fix this if you apply > it because setup data isn't measured or signed (yet ... however see the > linux-coco debate about how it should be). BTW, I can't find a reference now, but I recall being told that when using -kernel, OVMF won't enforce SecureBoot. ie it'll do the checks but ignore any failure and carry on. It should still be reflected in the vTPM measurements though. > I also note there was a v3 of the patch and considerable discussion > saying it couldn't work: > > https://lore.kernel.org/qemu-devel/20220804230411.17720-1-Jason@zx2c4.com/ > > Which is likely why it never went in ... although the discussion does > seem to resolve towards the end. With regards, Daniel
On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote: > This is already fixed via the patch that MST just sent in his pull. > So wait a few days for that to be merged and it'll be all set. > > No need for this patch here. Do not merge. If it's not a secret, would it be too much trouble to point to the branch so we can actually test it? It does seem that the biggest problem this issue shows is that there wasn't wide enough configuration testing done on the prior commits before they were merged. James
Hi Jason, James, On 01/02/2023 17:24, James Bottomley wrote: > On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote: >> This is already fixed via the patch that MST just sent in his pull. >> So wait a few days for that to be merged and it'll be all set. >> >> No need for this patch here. Do not merge. > > If it's not a secret, would it be too much trouble to point to the > branch so we can actually test it? It does seem that the biggest > problem this issue shows is that there wasn't wide enough configuration > testing done on the prior commits before they were merged. > I assume it is: ---- ... are available in the Git repository at: https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream for you to fetch changes up to f5cb612867d3b10b86d6361ba041767e02c1b127: docs/pcie.txt: Replace ioh3420 with pcie-root-port (2023-01-28 06:21:30 -0500) ---- I checked out this branch and started an SEV guest with measured boot and it fails during hash verification in OVMF: BlobVerifierLibSevHashesConstructor: Found injected hashes table in secure location VerifyBlob: Found GUID 4DE79437-ABD2-427F-B835-D5B172D2045B in table VerifyBlob: Hash comparison succeeded for "kernel" VerifyBlob: Found GUID 44BAF731-3A2F-4BD7-9AF1-41E29169781D in table VerifyBlob: Hash comparison succeeded for "initrd" VerifyBlob: Found GUID 97D02DD8-BD20-4C94-AA78-E7714D36AB2A in table VerifyBlob: Hash comparison failed for "cmdline" (before that patch it was failing on the "kernel" hash.) I haven't yet examined the suggested fix patch ("[PULL 10/56] x86: don't let decompressed kernel image clobber setup_data") - just ran the simple test above. -Dov
On Wed, 1 Feb 2023 at 15:25, James Bottomley <jejb@linux.ibm.com> wrote: > > On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote: > > This is already fixed via the patch that MST just sent in his pull. > > So wait a few days for that to be merged and it'll be all set. > > > > No need for this patch here. Do not merge. > > If it's not a secret, would it be too much trouble to point to the > branch so we can actually test it? It does seem that the biggest > problem this issue shows is that there wasn't wide enough configuration > testing done on the prior commits before they were merged. In general you shouldn't expect commits to be visible in a git branch before they get merged -- the QEMU process is not exactly identical to the kernel one. For a particular patch on the mailing list, you can get a git branch with it applied by looking for the patch in https://patchew.org/QEMU/ if that's more convenient than just applying it by hand. We also don't tend to want patches hanging around for testing before they get merged[*] -- we figure that "in upstream git" is the place that actually gets tested in practice; almost nobody will be working with or testing anything else. [*] The fix Jason refers to here that's in MST's pullreq unfortunately hasn't made it upstream as quickly as I would like, due to a combination of things including us having to pause CI for a week when we ran out of minutes. thanks -- PMM
It's not a secret, but I have so little internet right now that I can't even load a webpage, and I'm on my phone, hence the short HTMLified emails. In brief, though, it gets rid of all modifications to the kernel image all together, so it should fix your issue. On Wed, Feb 1, 2023, 10:24 James Bottomley <jejb@linux.ibm.com> wrote: > On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote: > > This is already fixed via the patch that MST just sent in his pull. > > So wait a few days for that to be merged and it'll be all set. > > > > No need for this patch here. Do not merge. > > If it's not a secret, would it be too much trouble to point to the > branch so we can actually test it? It does seem that the biggest > problem this issue shows is that there wasn't wide enough configuration > testing done on the prior commits before they were merged. > > James > >
On Wed, 2023-02-01 at 16:50 +0000, Peter Maydell wrote: > On Wed, 1 Feb 2023 at 15:25, James Bottomley <jejb@linux.ibm.com> > wrote: > > > > On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote: > > > This is already fixed via the patch that MST just sent in his > > > pull. > > > So wait a few days for that to be merged and it'll be all set. > > > > > > No need for this patch here. Do not merge. > > > > If it's not a secret, would it be too much trouble to point to the > > branch so we can actually test it? It does seem that the biggest > > problem this issue shows is that there wasn't wide enough > > configuration > > testing done on the prior commits before they were merged. > > In general you shouldn't expect commits to be visible in > a git branch before they get merged -- the QEMU process > is not exactly identical to the kernel one. > > For a particular patch on the mailing list, you can get a git branch > with it applied by looking for the patch in https://patchew.org/QEMU/ > if that's more convenient than just applying it by hand. The real issue is there have been so many patches flying around for this, it's not clear exactly what combination needed to be tested. Dov found the branch in tags/for_upstream but it's still failing measured direct boot, although the failure has shifted from the hash check of the kernel to the hash check of the command line. All I really wanted was a link to the patch ... I don't need the tree because inspection will tell me that it adds unexpected data at the end of an integrity checked file. > We also don't tend to want patches hanging around for testing > before they get merged[*] -- we figure that "in upstream git" > is the place that actually gets tested in practice; almost > nobody will be working with or testing anything else. > > [*] The fix Jason refers to here that's in MST's pullreq > unfortunately hasn't made it upstream as quickly as I > would like, due to a combination of things including us > having to pause CI for a week when we ran out of minutes. OK, so the problem is still the same as it was before: adding unexpected data to an integrity checked file. I don't get why there's all this dancing around trying to find space. Surely when the parameter was added, since it was a fixed size, the kernel header was expanded and the boot protocol version bumped? So we can use that to identify kernels which can use this property and have the space to insert it directly. James
On Wed, 2023-02-01 at 12:51 -0500, Jason A. Donenfeld wrote: > It's not a secret, but I have so little internet right now that I > can't even load a webpage, and I'm on my phone, hence the short > HTMLified emails. > > In brief, though, it gets rid of all modifications to the kernel > image all together, so it should fix your issue. We've already tested it and established it doesn't because you simply added your rng data to the end of a different integrity protected file which now fails the integrity check instead of the kernel. I checked the kernel source as well; I thought you'd have done the usual thing and bumped the boot protocol version to steal space in __pad9, but you didn't apparently. To fix this up after the fact, I recommend that we still steal space in _pad9[] but we make it have enough space for a setup_data header as well as the 32 random bytes, so we've officially reserved the space, but in earlier kernels than this change gets to you can still use the setup_data_offset method, except that it now uses the empty space in _pad9 via the setup_data mechanism. That should find you space and get you out of having to expand any integrity protected files. The SEV direct boot will still work because there's a check further down that doesn't copy the modified header back over the kernel because it is ignored on efi stub boot anyway. James
Hi James, On Wed, Feb 1, 2023, 15:39 James Bottomley <jejb@linux.ibm.com> wrote: > On Wed, 2023-02-01 at 12:51 -0500, Jason A. Donenfeld wrote: > > It's not a secret, but I have so little internet right now that I > > can't even load a webpage, and I'm on my phone, hence the short > > HTMLified emails. > > > > In brief, though, it gets rid of all modifications to the kernel > > image all together, so it should fix your issue. > > We've already tested it and established it doesn't because you simply > added your rng data to the end of a different integrity protected file > which now fails the integrity check instead of the kernel. > > I checked the kernel source as well; I thought you'd have done the > usual thing and bumped the boot protocol version to steal space in > __pad9, but you didn't apparently. To fix this up after the fact, I > recommend that we still steal space in _pad9[] but we make it have > enough space for a setup_data header as well as the 32 random bytes, so > we've officially reserved the space, but in earlier kernels than this > change gets to you can still use the setup_data_offset method, except > that it now uses the empty space in _pad9 via the setup_data mechanism. > That should find you space and get you out of having to expand any > integrity protected files. The SEV direct boot will still work because > there's a check further down that doesn't copy the modified header back > over the kernel because it is ignored on efi stub boot anyway. Ahh, it looks like there's now an integrity check on the cmdline file. Darn. The patch in that PULL is still good and necessary and fixed a *different* bug, though. So we should still move forward on that. But it sounds like you might now have a concrete suggestion on something even better. I'm CCing hpa, as this is his wheelhouse, and maybe you two can divise the next step while I'm away. Maybe the pad9 thing you mentioned is the super nice solution we've been searching for this whole time. When I'm home in 10 days and have internet again, I'll take a look at where thing's are out and try to figure out how I can be productive again with it. And sorry again for the short HTML emails. A day ago I was using mosh from my phone to use mutt on a server to reply to emails downloaded from lore. But today the cloud cover means the best I can do is queue these up in the Android gmail client and hope they eventually send. Jason
On Wed, 2023-02-01 at 15:48 -0500, Jason A. Donenfeld wrote: [...] > But it sounds like you might now have a concrete suggestion on > something even better. I'm CCing hpa, as this is his wheelhouse, and > maybe you two can divise the next step while I'm away. Maybe the pad9 > thing you mentioned is the super nice solution we've been searching > for this whole time. When I'm home in 10 days and have internet > again, I'll take a look at where thing's are out and try to figure > out how I can be productive again with it. OK, so just FYI HPA, this is the patch I'm thinking of sending to linux-kernel to reserve space in struct boot_params for this. If you could take a look and advise on the location before I send the final patch, I'd be grateful. I took space in _pad9 because that's the standard method (add on to end), but it does strike me we could also use all of _pad8 for the (the addition is only 48 bytes) or even _pad3 + hd0_info + hd1_info. James --- diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S index 9338c68e7413..0120ab77dac9 100644 --- a/arch/x86/boot/header.S +++ b/arch/x86/boot/header.S @@ -308,7 +308,7 @@ _start: # Part 2 of the header, from the old setup.S .ascii "HdrS" # header signature - .word 0x020f # header version number (>= 0x0105) + .word 0x0210 # header version number (>= 0x0105) # or else old loadlin-1.5 will fail) .globl realmode_swtch realmode_swtch: .word 0, 0 # default_switch, SETUPSEG diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index 01d19fc22346..c614ff0755f2 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -181,6 +181,19 @@ struct ima_setup_data { __u64 size; } __attribute__((packed)); +/* + * Define a boot_param area for the RNG seed which can be used via the + * setup_data mechanism (so must have a setup_data header) but which + * is embedded in boot_params because qemu has been unable to find + * a safe data space for it. The value RNG_SEED_LENGTH must not + * change (pad length dependent on it) and must match the value in QEMU + */ +#define RNG_SEED_LENGTH 32 +struct random_seed_data { + struct setup_data s; + __u8 data[RNG_SEED_LENGTH]; +} __attribute__((packed)); + /* The so-called "zeropage" */ struct boot_params { struct screen_info screen_info; /* 0x000 */ @@ -228,7 +241,8 @@ struct boot_params { struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */ __u8 _pad8[48]; /* 0xcd0 */ struct edd_info eddbuf[EDDMAXNR]; /* 0xd00 */ - __u8 _pad9[276]; /* 0xeec */ + struct random_seed_data random_seed; /* 0xeec */ + __u8 _pad9[228]; /* 0xf1c */ } __attribute__((packed)); /** diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 6b58610a1552..fb719682579d 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -110,13 +110,10 @@ static int setup_e820_entries(struct boot_params *params) return 0; } -enum { RNG_SEED_LENGTH = 32 }; - static void -setup_rng_seed(struct boot_params *params, unsigned long params_load_addr, - unsigned int rng_seed_setup_data_offset) +setup_rng_seed(struct boot_params *params, unsigned long params_load_addr) { - struct setup_data *sd = (void *)params + rng_seed_setup_data_offset; + struct setup_data *sd = ¶ms->random_seed.s; unsigned long setup_data_phys; if (!rng_is_initialized()) @@ -125,7 +122,8 @@ setup_rng_seed(struct boot_params *params, unsigned long params_load_addr, sd->type = SETUP_RNG_SEED; sd->len = RNG_SEED_LENGTH; get_random_bytes(sd->data, RNG_SEED_LENGTH); - setup_data_phys = params_load_addr + rng_seed_setup_data_offset; + setup_data_phys = params_load_addr + offsetof(struct boot_params, + random_seed); sd->next = params->hdr.setup_data; params->hdr.setup_data = setup_data_phys; } @@ -306,7 +304,7 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params, } /* Setup RNG seed */ - setup_rng_seed(params, params_load_addr, setup_data_offset); + setup_rng_seed(params, params_load_addr); /* Setup EDD info */ memcpy(params->eddbuf, boot_params.eddbuf,
On February 2, 2023 6:38:12 AM PST, James Bottomley <jejb@linux.ibm.com> wrote: >On Wed, 2023-02-01 at 15:48 -0500, Jason A. Donenfeld wrote: >[...] >> But it sounds like you might now have a concrete suggestion on >> something even better. I'm CCing hpa, as this is his wheelhouse, and >> maybe you two can divise the next step while I'm away. Maybe the pad9 >> thing you mentioned is the super nice solution we've been searching >> for this whole time. When I'm home in 10 days and have internet >> again, I'll take a look at where thing's are out and try to figure >> out how I can be productive again with it. > >OK, so just FYI HPA, this is the patch I'm thinking of sending to >linux-kernel to reserve space in struct boot_params for this. If you >could take a look and advise on the location before I send the final >patch, I'd be grateful. I took space in _pad9 because that's the >standard method (add on to end), but it does strike me we could also >use all of _pad8 for the (the addition is only 48 bytes) or even _pad3 >+ hd0_info + hd1_info. > >James > >--- > >diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S >index 9338c68e7413..0120ab77dac9 100644 >--- a/arch/x86/boot/header.S >+++ b/arch/x86/boot/header.S >@@ -308,7 +308,7 @@ _start: > # Part 2 of the header, from the old setup.S > > .ascii "HdrS" # header signature >- .word 0x020f # header version number (>= 0x0105) >+ .word 0x0210 # header version number (>= 0x0105) > # or else old loadlin-1.5 will fail) > .globl realmode_swtch > realmode_swtch: .word 0, 0 # default_switch, SETUPSEG >diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h >index 01d19fc22346..c614ff0755f2 100644 >--- a/arch/x86/include/uapi/asm/bootparam.h >+++ b/arch/x86/include/uapi/asm/bootparam.h >@@ -181,6 +181,19 @@ struct ima_setup_data { > __u64 size; > } __attribute__((packed)); > >+/* >+ * Define a boot_param area for the RNG seed which can be used via the >+ * setup_data mechanism (so must have a setup_data header) but which >+ * is embedded in boot_params because qemu has been unable to find >+ * a safe data space for it. The value RNG_SEED_LENGTH must not >+ * change (pad length dependent on it) and must match the value in QEMU >+ */ >+#define RNG_SEED_LENGTH 32 >+struct random_seed_data { >+ struct setup_data s; >+ __u8 data[RNG_SEED_LENGTH]; >+} __attribute__((packed)); >+ > /* The so-called "zeropage" */ > struct boot_params { > struct screen_info screen_info; /* 0x000 */ >@@ -228,7 +241,8 @@ struct boot_params { > struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */ > __u8 _pad8[48]; /* 0xcd0 */ > struct edd_info eddbuf[EDDMAXNR]; /* 0xd00 */ >- __u8 _pad9[276]; /* 0xeec */ >+ struct random_seed_data random_seed; /* 0xeec */ >+ __u8 _pad9[228]; /* 0xf1c */ > } __attribute__((packed)); > > /** >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c >index 6b58610a1552..fb719682579d 100644 >--- a/arch/x86/kernel/kexec-bzimage64.c >+++ b/arch/x86/kernel/kexec-bzimage64.c >@@ -110,13 +110,10 @@ static int setup_e820_entries(struct boot_params *params) > return 0; > } > >-enum { RNG_SEED_LENGTH = 32 }; >- > static void >-setup_rng_seed(struct boot_params *params, unsigned long params_load_addr, >- unsigned int rng_seed_setup_data_offset) >+setup_rng_seed(struct boot_params *params, unsigned long params_load_addr) > { >- struct setup_data *sd = (void *)params + rng_seed_setup_data_offset; >+ struct setup_data *sd = ¶ms->random_seed.s; > unsigned long setup_data_phys; > > if (!rng_is_initialized()) >@@ -125,7 +122,8 @@ setup_rng_seed(struct boot_params *params, unsigned long params_load_addr, > sd->type = SETUP_RNG_SEED; > sd->len = RNG_SEED_LENGTH; > get_random_bytes(sd->data, RNG_SEED_LENGTH); >- setup_data_phys = params_load_addr + rng_seed_setup_data_offset; >+ setup_data_phys = params_load_addr + offsetof(struct boot_params, >+ random_seed); > sd->next = params->hdr.setup_data; > params->hdr.setup_data = setup_data_phys; > } >@@ -306,7 +304,7 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params, > } > > /* Setup RNG seed */ >- setup_rng_seed(params, params_load_addr, setup_data_offset); >+ setup_rng_seed(params, params_load_addr); > > /* Setup EDD info */ > memcpy(params->eddbuf, boot_params.eddbuf, > NAK. We need to fix the actual problem of the kernel stomping on memory it shouldn't, not paper around it.
On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote: [...] > NAK. We need to fix the actual problem of the kernel stomping on > memory it shouldn't, not paper around it. This is a first boot situation, not kexec (I just updated kexec because it should use any new mechanism we propose). Unlike kexec, for first boot we're very constrained by the amount of extra space QEMU has to do this. The boot_params are the first page of the kernel load, but the kernel proper begins directly after it, so we can't expand it. The two schemes tried: loading after the kernel and loading after the command line both tamper with integrity protected files, so we shouldn't use this mechanism. This is the essence of the problem: If we add this area at boot, it has to go in an existing memory location; we can't steal random guest areas. All current config parameters are passed through as fw_config files, so we can only use that mechanism *if* we know where the area ends up in the loaded kernel *and* the file isn't integrity protected (this latter is expanding over time). If we could wind back time, I'd have added the 32 byte random seed to boot_params properly not coded it as a setup_data addition, but now we're stuck with coping with existing behaviour, which is why I thought the retro fit to boot_params would be the better path forward, but if you have any alternatives, I'm sure we could look at them. James
On February 2, 2023 7:17:01 AM PST, James Bottomley <jejb@linux.ibm.com> wrote: >On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote: >[...] >> NAK. We need to fix the actual problem of the kernel stomping on >> memory it shouldn't, not paper around it. > >This is a first boot situation, not kexec (I just updated kexec because >it should use any new mechanism we propose). Unlike kexec, for first >boot we're very constrained by the amount of extra space QEMU has to do >this. The boot_params are the first page of the kernel load, but the >kernel proper begins directly after it, so we can't expand it. The two >schemes tried: loading after the kernel and loading after the command >line both tamper with integrity protected files, so we shouldn't use >this mechanism. This is the essence of the problem: If we add this >area at boot, it has to go in an existing memory location; we can't >steal random guest areas. All current config parameters are passed >through as fw_config files, so we can only use that mechanism *if* we >know where the area ends up in the loaded kernel *and* the file isn't >integrity protected (this latter is expanding over time). > >If we could wind back time, I'd have added the 32 byte random seed to >boot_params properly not coded it as a setup_data addition, but now >we're stuck with coping with existing behaviour, which is why I thought >the retro fit to boot_params would be the better path forward, but if >you have any alternatives, I'm sure we could look at them. > >James > The right thing to do is to fix the kernel so that it doesn't stomp on this memory, just as it cannot stomp on boot_params, initrd, or the command line. The kernel boot protocol defines a keep-out area, but physical kaslr violates it and so the kaslr code in the decompressor is responsible for keeping track of the keepout areas, and apparently noone every did. Adding it to boot_params and bumping the version number is a hack that doesn't solve the backwards compatibility problem, so we should just fix the bug instead. Adding it to boot_params and adding a setup_data pointer MAY be backwards compatible, but it also enables an absolutely catastrophic failure mode: an unaware loader may end up relocating boot_params without knowing that that data has a secondary pointer into it, with the result that we now have a bogus pointer in a linked list. Not good. Fixing the bug properly is also the only way forward for future setup_data users, and we are running low on space in the fixed-size structures.
On February 2, 2023 7:17:01 AM PST, James Bottomley <jejb@linux.ibm.com> wrote: >On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote: >[...] >> NAK. We need to fix the actual problem of the kernel stomping on >> memory it shouldn't, not paper around it. > >This is a first boot situation, not kexec (I just updated kexec because >it should use any new mechanism we propose). Unlike kexec, for first >boot we're very constrained by the amount of extra space QEMU has to do >this. The boot_params are the first page of the kernel load, but the >kernel proper begins directly after it, so we can't expand it. The two >schemes tried: loading after the kernel and loading after the command >line both tamper with integrity protected files, so we shouldn't use >this mechanism. This is the essence of the problem: If we add this >area at boot, it has to go in an existing memory location; we can't >steal random guest areas. All current config parameters are passed >through as fw_config files, so we can only use that mechanism *if* we >know where the area ends up in the loaded kernel *and* the file isn't >integrity protected (this latter is expanding over time). > >If we could wind back time, I'd have added the 32 byte random seed to >boot_params properly not coded it as a setup_data addition, but now >we're stuck with coping with existing behaviour, which is why I thought >the retro fit to boot_params would be the better path forward, but if >you have any alternatives, I'm sure we could look at them. > >James > Somewhat aside... There are other issues with passing the entropy seed in memory, of course; in order to avoid accidental or malicious reuse it would be far better if the entropy was actively pulled at use time via virtual I/O, a hypercall, or RDRAND/RDSEED emulation, but I do realize that that is not always an option.
On February 2, 2023 7:17:01 AM PST, James Bottomley <jejb@linux.ibm.com> wrote: >On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote: >[...] >> NAK. We need to fix the actual problem of the kernel stomping on >> memory it shouldn't, not paper around it. > >This is a first boot situation, not kexec (I just updated kexec because >it should use any new mechanism we propose). Unlike kexec, for first >boot we're very constrained by the amount of extra space QEMU has to do >this. The boot_params are the first page of the kernel load, but the >kernel proper begins directly after it, so we can't expand it. The two >schemes tried: loading after the kernel and loading after the command >line both tamper with integrity protected files, so we shouldn't use >this mechanism. This is the essence of the problem: If we add this >area at boot, it has to go in an existing memory location; we can't >steal random guest areas. All current config parameters are passed >through as fw_config files, so we can only use that mechanism *if* we >know where the area ends up in the loaded kernel *and* the file isn't >integrity protected (this latter is expanding over time). > >If we could wind back time, I'd have added the 32 byte random seed to >boot_params properly not coded it as a setup_data addition, but now >we're stuck with coping with existing behaviour, which is why I thought >the retro fit to boot_params would be the better path forward, but if >you have any alternatives, I'm sure we could look at them. > >James > One option that you do have that should be backwards compatible, even, is to mark that memory as reserved in the memory map, basically doing the job that the kernel decompressor should have done in the first place. The downside is that existing kennels will never reclaim that memory, but since it is such a small amount, it shouldn't really matter. We could even reserve a memory type code for it; that way a newer kernel could know to reclaim that memory at the very end of the boot process, when it would deallocate other setup_data entries. Existing kernels will, for obvious reasons, treat unknown memory types as equivalent to type 2 – permanent keep out.
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 83c57c6eb1..11e8dd7ca7 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -357,6 +357,7 @@ static void pc_q35_machine_options(MachineClass *m) pcmc->default_nic_model = "e1000e"; pcmc->pci_root_uid = 0; pcmc->default_cpu_version = 1; + pcmc->legacy_no_rng_seed = true; m->family = "pc_q35"; m->desc = "Standard PC (Q35 + ICH9, 2009)"; @@ -394,9 +395,7 @@ DEFINE_Q35_MACHINE(v7_2, "pc-q35-7.2", NULL, static void pc_q35_7_1_machine_options(MachineClass *m) { - PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_7_2_machine_options(m); - pcmc->legacy_no_rng_seed = true; compat_props_add(m->compat_props, hw_compat_7_1, hw_compat_7_1_len); compat_props_add(m->compat_props, pc_compat_7_1,
The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG seed via setup_data entry") modifies the kernel image file to append a random seed. Obviously this makes the hash of the kernel file non-deterministic and so breaks both measured and some signed boots. The commit notes it's only for non-EFI (because EFI has a different RNG seeding mechanism) so, since there are no non-EFI q35 systems, this should be disabled for the whole of the q35 machine type to bring back deterministic kernel file hashes. Obviously this still leaves the legacy bios case broken for at least measured boot, but I don't think anyone cares about that now. Signed-off-by: James Bottomley <jejb@linux.ibm.com> --- hw/i386/pc_q35.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) pc_compat_7_1_len); }