diff mbox series

x86: fix q35 kernel measurements broken due to rng seeding

Message ID da39abab9785aea2a2e7652ed6403b6268aeb31f.camel@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series x86: fix q35 kernel measurements broken due to rng seeding | expand

Commit Message

James Bottomley Feb. 1, 2023, 1:57 p.m. UTC
The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG seed
via setup_data entry") modifies the kernel image file to append a
random seed.  Obviously this makes the hash of the kernel file
non-deterministic and so breaks both measured and some signed boots.
The commit notes it's only for non-EFI (because EFI has a different
RNG seeding mechanism) so, since there are no non-EFI q35 systems, this
should be disabled for the whole of the q35 machine type to bring back
deterministic kernel file hashes.

Obviously this still leaves the legacy bios case broken for at least
measured boot, but I don't think anyone cares about that now.

Signed-off-by: James Bottomley <jejb@linux.ibm.com>
---
 hw/i386/pc_q35.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

pc_compat_7_1_len);
 }

Comments

Daniel P. Berrangé Feb. 1, 2023, 2:35 p.m. UTC | #1
On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote:
> The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG seed
> via setup_data entry") modifies the kernel image file to append a
> random seed.  Obviously this makes the hash of the kernel file
> non-deterministic and so breaks both measured and some signed boots.

I recall raising that at the time

  https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html

and Jason pointed me to a followup which I tested and believe
fixed it for SEV:

  https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html

but it doesn't look like that second patch ever merged. We went
through so many patches I think it probably got obsoleted by
something else, and no one rechecked SEV again.

> The commit notes it's only for non-EFI (because EFI has a different
> RNG seeding mechanism) so, since there are no non-EFI q35 systems, this
> should be disabled for the whole of the q35 machine type to bring back
> deterministic kernel file hashes.

SeaBIOS is the default firmware for both q35 and i440fx. The
majority of systems using q35 will be non-EFI today, and that
is what the random seed was intended to address. I don't think
we can just disable this for the whole of q35.

When you say it breaks measured / signed boots, I presume you
are specifically referring to SEV kernel hashes measurements ?
Or is there a more general problem to solve ?

> Obviously this still leaves the legacy bios case broken for at least
> measured boot, but I don't think anyone cares about that now.
> 
> Signed-off-by: James Bottomley <jejb@linux.ibm.com>
> ---
>  hw/i386/pc_q35.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 83c57c6eb1..11e8dd7ca7 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -357,6 +357,7 @@ static void pc_q35_machine_options(MachineClass *m)
>      pcmc->default_nic_model = "e1000e";
>      pcmc->pci_root_uid = 0;
>      pcmc->default_cpu_version = 1;
> +    pcmc->legacy_no_rng_seed = true;
>  
>      m->family = "pc_q35";
>      m->desc = "Standard PC (Q35 + ICH9, 2009)";
> @@ -394,9 +395,7 @@ DEFINE_Q35_MACHINE(v7_2, "pc-q35-7.2", NULL,
>  
>  static void pc_q35_7_1_machine_options(MachineClass *m)
>  {
> -    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pc_q35_7_2_machine_options(m);
> -    pcmc->legacy_no_rng_seed = true;
>      compat_props_add(m->compat_props, hw_compat_7_1,
> hw_compat_7_1_len);
>      compat_props_add(m->compat_props, pc_compat_7_1,
> pc_compat_7_1_len);
>  }

This patch changes behaviour of the pc-q35-7.2 machine type. Any
change will need to be in latest development 8.0 machine type only

With regards,
Daniel
James Bottomley Feb. 1, 2023, 2:56 p.m. UTC | #2
On Wed, 2023-02-01 at 14:35 +0000, Daniel P. Berrangé wrote:
> On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote:
> > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG
> > seed
> > via setup_data entry") modifies the kernel image file to append a
> > random seed.  Obviously this makes the hash of the kernel file
> > non-deterministic and so breaks both measured and some signed
> > boots.
> 
> I recall raising that at the time
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html
> 
> and Jason pointed me to a followup which I tested and believe
> fixed it for SEV:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html
> 
> but it doesn't look like that second patch ever merged. We went
> through so many patches I think it probably got obsoleted by
> something else, and no one rechecked SEV again.

The kernel file problem is a pretty huge one.  OVMF lays it down on an
internal file system and without the second patch, it now contains
random junk at the end.  Anything that hashes the whole file (which
includes not only the measured direct boot but also grub signatures and
probably other bootloader signing mechanisms) will have an issue.

> > The commit notes it's only for non-EFI (because EFI has a different
> > RNG seeding mechanism) so, since there are no non-EFI q35 systems,
> > this should be disabled for the whole of the q35 machine type to
> > bring back deterministic kernel file hashes.
> 
> SeaBIOS is the default firmware for both q35 and i440fx. The
> majority of systems using q35 will be non-EFI today, and that
> is what the random seed was intended to address. I don't think
> we can just disable this for the whole of q35.
> 
> When you say it breaks measured / signed boots, I presume you
> are specifically referring to SEV kernel hashes measurements ?
> Or is there a more general problem to solve ?

No it generally breaks measured/signed boots because it adds random
junk to the kernel file.  The second patch will fix this if you apply
it because setup data isn't measured or signed (yet ... however see the
linux-coco debate about how it should be).

I also note there was a v3 of the patch and considerable discussion
saying it couldn't work:

https://lore.kernel.org/qemu-devel/20220804230411.17720-1-Jason@zx2c4.com/

Which is likely why it never went in ... although the discussion does
seem to resolve towards the end.

James
Jason A. Donenfeld Feb. 1, 2023, 3:10 p.m. UTC | #3
This is already fixed via the patch that MST just sent in his pull. So wait
a few days for that to be merged and it'll be all set.

No need for this patch here. Do not merge.

On Wed, Feb 1, 2023, 08:57 James Bottomley <jejb@linux.ibm.com> wrote:

> The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG seed
> via setup_data entry") modifies the kernel image file to append a
> random seed.  Obviously this makes the hash of the kernel file
> non-deterministic and so breaks both measured and some signed boots.
> The commit notes it's only for non-EFI (because EFI has a different
> RNG seeding mechanism) so, since there are no non-EFI q35 systems, this
> should be disabled for the whole of the q35 machine type to bring back
> deterministic kernel file hashes.
>
> Obviously this still leaves the legacy bios case broken for at least
> measured boot, but I don't think anyone cares about that now.
>
> Signed-off-by: James Bottomley <jejb@linux.ibm.com>
> ---
>  hw/i386/pc_q35.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index 83c57c6eb1..11e8dd7ca7 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -357,6 +357,7 @@ static void pc_q35_machine_options(MachineClass *m)
>      pcmc->default_nic_model = "e1000e";
>      pcmc->pci_root_uid = 0;
>      pcmc->default_cpu_version = 1;
> +    pcmc->legacy_no_rng_seed = true;
>
>      m->family = "pc_q35";
>      m->desc = "Standard PC (Q35 + ICH9, 2009)";
> @@ -394,9 +395,7 @@ DEFINE_Q35_MACHINE(v7_2, "pc-q35-7.2", NULL,
>
>  static void pc_q35_7_1_machine_options(MachineClass *m)
>  {
> -    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
>      pc_q35_7_2_machine_options(m);
> -    pcmc->legacy_no_rng_seed = true;
>      compat_props_add(m->compat_props, hw_compat_7_1,
> hw_compat_7_1_len);
>      compat_props_add(m->compat_props, pc_compat_7_1,
> pc_compat_7_1_len);
>  }
> --
> 2.35.3
>
>
>
Jason A. Donenfeld Feb. 1, 2023, 3:12 p.m. UTC | #4
This patch is not needed. It is already fixed in a pending pull. Do not
merge.

On Wed, Feb 1, 2023, 09:57 James Bottomley <jejb@linux.ibm.com> wrote:

> On Wed, 2023-02-01 at 14:35 +0000, Daniel P. Berrangé wrote:
> > On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote:
> > > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG
> > > seed
> > > via setup_data entry") modifies the kernel image file to append a
> > > random seed.  Obviously this makes the hash of the kernel file
> > > non-deterministic and so breaks both measured and some signed
> > > boots.
> >
> > I recall raising that at the time
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html
> >
> > and Jason pointed me to a followup which I tested and believe
> > fixed it for SEV:
> >
> >   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html
> >
> > but it doesn't look like that second patch ever merged. We went
> > through so many patches I think it probably got obsoleted by
> > something else, and no one rechecked SEV again.
>
> The kernel file problem is a pretty huge one.  OVMF lays it down on an
> internal file system and without the second patch, it now contains
> random junk at the end.  Anything that hashes the whole file (which
> includes not only the measured direct boot but also grub signatures and
> probably other bootloader signing mechanisms) will have an issue.
>
> > > The commit notes it's only for non-EFI (because EFI has a different
> > > RNG seeding mechanism) so, since there are no non-EFI q35 systems,
> > > this should be disabled for the whole of the q35 machine type to
> > > bring back deterministic kernel file hashes.
> >
> > SeaBIOS is the default firmware for both q35 and i440fx. The
> > majority of systems using q35 will be non-EFI today, and that
> > is what the random seed was intended to address. I don't think
> > we can just disable this for the whole of q35.
> >
> > When you say it breaks measured / signed boots, I presume you
> > are specifically referring to SEV kernel hashes measurements ?
> > Or is there a more general problem to solve ?
>
> No it generally breaks measured/signed boots because it adds random
> junk to the kernel file.  The second patch will fix this if you apply
> it because setup data isn't measured or signed (yet ... however see the
> linux-coco debate about how it should be).
>
> I also note there was a v3 of the patch and considerable discussion
> saying it couldn't work:
>
> https://lore.kernel.org/qemu-devel/20220804230411.17720-1-Jason@zx2c4.com/
>
> Which is likely why it never went in ... although the discussion does
> seem to resolve towards the end.
>
> James
>
>
Daniel P. Berrangé Feb. 1, 2023, 3:14 p.m. UTC | #5
On Wed, Feb 01, 2023 at 09:56:35AM -0500, James Bottomley wrote:
> On Wed, 2023-02-01 at 14:35 +0000, Daniel P. Berrangé wrote:
> > On Wed, Feb 01, 2023 at 08:57:10AM -0500, James Bottomley wrote:
> > > The origin commit for rng seeding 67f7e426e5 ("hw/i386: pass RNG
> > > seed
> > > via setup_data entry") modifies the kernel image file to append a
> > > random seed.  Obviously this makes the hash of the kernel file
> > > non-deterministic and so breaks both measured and some signed
> > > boots.
> > 
> > I recall raising that at the time
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00710.html
> > 
> > and Jason pointed me to a followup which I tested and believe
> > fixed it for SEV:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2022-08/msg00601.html
> > 
> > but it doesn't look like that second patch ever merged. We went
> > through so many patches I think it probably got obsoleted by
> > something else, and no one rechecked SEV again.
> 
> The kernel file problem is a pretty huge one.  OVMF lays it down on an
> internal file system and without the second patch, it now contains
> random junk at the end.  Anything that hashes the whole file (which
> includes not only the measured direct boot but also grub signatures and
> probably other bootloader signing mechanisms) will have an issue.
> 
> > > The commit notes it's only for non-EFI (because EFI has a different
> > > RNG seeding mechanism) so, since there are no non-EFI q35 systems,
> > > this should be disabled for the whole of the q35 machine type to
> > > bring back deterministic kernel file hashes.
> > 
> > SeaBIOS is the default firmware for both q35 and i440fx. The
> > majority of systems using q35 will be non-EFI today, and that
> > is what the random seed was intended to address. I don't think
> > we can just disable this for the whole of q35.
> > 
> > When you say it breaks measured / signed boots, I presume you
> > are specifically referring to SEV kernel hashes measurements ?
> > Or is there a more general problem to solve ?
> 
> No it generally breaks measured/signed boots because it adds random
> junk to the kernel file.  The second patch will fix this if you apply
> it because setup data isn't measured or signed (yet ... however see the
> linux-coco debate about how it should be).

BTW, I can't find a reference now, but I recall being told that when
using -kernel,  OVMF won't enforce SecureBoot. ie it'll do the checks
but ignore any failure and carry on. It should still be reflected in
the vTPM measurements though.

> I also note there was a v3 of the patch and considerable discussion
> saying it couldn't work:
> 
> https://lore.kernel.org/qemu-devel/20220804230411.17720-1-Jason@zx2c4.com/
> 
> Which is likely why it never went in ... although the discussion does
> seem to resolve towards the end.

With regards,
Daniel
James Bottomley Feb. 1, 2023, 3:24 p.m. UTC | #6
On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote:
> This is already fixed via the patch that MST just sent in his pull.
> So wait a few days for that to be merged and it'll be all set.
> 
> No need for this patch here. Do not merge.

If it's not a secret, would it be too much trouble to point to the
branch so we can actually test it?  It does seem that the biggest
problem this issue shows is that there wasn't wide enough configuration
testing done on the prior commits before they were merged.

James
Dov Murik Feb. 1, 2023, 4:41 p.m. UTC | #7
Hi Jason, James,


On 01/02/2023 17:24, James Bottomley wrote:
> On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote:
>> This is already fixed via the patch that MST just sent in his pull.
>> So wait a few days for that to be merged and it'll be all set.
>>
>> No need for this patch here. Do not merge.
> 
> If it's not a secret, would it be too much trouble to point to the
> branch so we can actually test it?  It does seem that the biggest
> problem this issue shows is that there wasn't wide enough configuration
> testing done on the prior commits before they were merged.
> 

I assume it is:

----
... are available in the Git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to f5cb612867d3b10b86d6361ba041767e02c1b127:

  docs/pcie.txt: Replace ioh3420 with pcie-root-port (2023-01-28 06:21:30 -0500)
----

I checked out this branch and started an SEV guest with measured boot
and it fails during hash verification in OVMF:

BlobVerifierLibSevHashesConstructor: Found injected hashes table in secure location
VerifyBlob: Found GUID 4DE79437-ABD2-427F-B835-D5B172D2045B in table
VerifyBlob: Hash comparison succeeded for "kernel"
VerifyBlob: Found GUID 44BAF731-3A2F-4BD7-9AF1-41E29169781D in table
VerifyBlob: Hash comparison succeeded for "initrd"
VerifyBlob: Found GUID 97D02DD8-BD20-4C94-AA78-E7714D36AB2A in table
VerifyBlob: Hash comparison failed for "cmdline"


(before that patch it was failing on the "kernel" hash.)

I haven't yet examined the suggested fix patch
("[PULL 10/56] x86: don't let decompressed kernel image clobber setup_data") -
just ran the simple test above.


-Dov
Peter Maydell Feb. 1, 2023, 4:50 p.m. UTC | #8
On Wed, 1 Feb 2023 at 15:25, James Bottomley <jejb@linux.ibm.com> wrote:
>
> On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote:
> > This is already fixed via the patch that MST just sent in his pull.
> > So wait a few days for that to be merged and it'll be all set.
> >
> > No need for this patch here. Do not merge.
>
> If it's not a secret, would it be too much trouble to point to the
> branch so we can actually test it?  It does seem that the biggest
> problem this issue shows is that there wasn't wide enough configuration
> testing done on the prior commits before they were merged.

In general you shouldn't expect commits to be visible in
a git branch before they get merged -- the QEMU process
is not exactly identical to the kernel one.

For a particular patch on the mailing list, you can get a git branch
with it applied by looking for the patch in https://patchew.org/QEMU/
if that's more convenient than just applying it by hand.

We also don't tend to want patches hanging around for testing
before they get merged[*] -- we figure that "in upstream git"
is the place that actually gets tested in practice; almost
nobody will be working with or testing anything else.

[*] The fix Jason refers to here that's in MST's pullreq
unfortunately hasn't made it upstream as quickly as I
would like, due to a combination of things including us
having to pause CI for a week when we ran out of minutes.

thanks
-- PMM
Jason A. Donenfeld Feb. 1, 2023, 5:51 p.m. UTC | #9
It's not a secret, but I have so little internet right now that I can't
even load a webpage, and I'm on my phone, hence the short HTMLified emails.

In brief, though, it gets rid of all modifications to the kernel image all
together, so it should fix your issue.

On Wed, Feb 1, 2023, 10:24 James Bottomley <jejb@linux.ibm.com> wrote:

> On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote:
> > This is already fixed via the patch that MST just sent in his pull.
> > So wait a few days for that to be merged and it'll be all set.
> >
> > No need for this patch here. Do not merge.
>
> If it's not a secret, would it be too much trouble to point to the
> branch so we can actually test it?  It does seem that the biggest
> problem this issue shows is that there wasn't wide enough configuration
> testing done on the prior commits before they were merged.
>
> James
>
>
James Bottomley Feb. 1, 2023, 7:35 p.m. UTC | #10
On Wed, 2023-02-01 at 16:50 +0000, Peter Maydell wrote:
> On Wed, 1 Feb 2023 at 15:25, James Bottomley <jejb@linux.ibm.com>
> wrote:
> > 
> > On Wed, 2023-02-01 at 10:10 -0500, Jason A. Donenfeld wrote:
> > > This is already fixed via the patch that MST just sent in his
> > > pull.
> > > So wait a few days for that to be merged and it'll be all set.
> > > 
> > > No need for this patch here. Do not merge.
> > 
> > If it's not a secret, would it be too much trouble to point to the
> > branch so we can actually test it?  It does seem that the biggest
> > problem this issue shows is that there wasn't wide enough
> > configuration
> > testing done on the prior commits before they were merged.
> 
> In general you shouldn't expect commits to be visible in
> a git branch before they get merged -- the QEMU process
> is not exactly identical to the kernel one.
> 
> For a particular patch on the mailing list, you can get a git branch
> with it applied by looking for the patch in https://patchew.org/QEMU/
> if that's more convenient than just applying it by hand.

The real issue is there have been so many patches flying around for
this, it's not clear exactly what combination needed to be tested.  Dov
found the branch in tags/for_upstream but it's still failing measured 
direct boot, although the failure has shifted from the hash check of
the kernel to the hash check of the command line.

All I really wanted was a link to the patch ... I don't need the tree
because inspection will tell me that it adds unexpected data at the end
of an integrity checked file.

> We also don't tend to want patches hanging around for testing
> before they get merged[*] -- we figure that "in upstream git"
> is the place that actually gets tested in practice; almost
> nobody will be working with or testing anything else.
> 
> [*] The fix Jason refers to here that's in MST's pullreq
> unfortunately hasn't made it upstream as quickly as I
> would like, due to a combination of things including us
> having to pause CI for a week when we ran out of minutes.

OK, so the problem is still the same as it was before: adding
unexpected data to an integrity checked file.

I don't get why there's all this dancing around trying to find space. 
Surely when the parameter was added, since it was a fixed size, the
kernel header was expanded and the boot protocol version bumped?  So we
can use that to identify kernels which can use this property and have
the space to insert it directly.

James
James Bottomley Feb. 1, 2023, 8:38 p.m. UTC | #11
On Wed, 2023-02-01 at 12:51 -0500, Jason A. Donenfeld wrote:
> It's not a secret, but I have so little internet right now that I
> can't even load a webpage, and I'm on my phone, hence the short
> HTMLified emails.
> 
> In brief, though, it gets rid of all modifications to the kernel
> image all together, so it should fix your issue.

We've already tested it and established it doesn't because you simply
added your rng data to the end of a different integrity protected file
which now fails the integrity check instead of the kernel.

I checked the kernel source as well; I thought you'd have done the
usual thing and bumped the boot protocol version to steal space in
__pad9, but you didn't apparently.  To fix this up after the fact, I
recommend that we still steal space in _pad9[] but we make it have
enough space for a setup_data header as well as the 32 random bytes, so
we've officially reserved the space, but in earlier kernels than this
change gets to you can still use the setup_data_offset method, except
that it now uses the empty space in _pad9 via the setup_data mechanism.
That should find you space and get you out of having to expand any
integrity protected files.  The SEV direct boot will still work because
there's a check further down that doesn't copy the modified header back
over the kernel because it is ignored on efi stub boot anyway.

James
Jason A. Donenfeld Feb. 1, 2023, 8:48 p.m. UTC | #12
Hi James,

On Wed, Feb 1, 2023, 15:39 James Bottomley <jejb@linux.ibm.com> wrote:

> On Wed, 2023-02-01 at 12:51 -0500, Jason A. Donenfeld wrote:
> > It's not a secret, but I have so little internet right now that I
> > can't even load a webpage, and I'm on my phone, hence the short
> > HTMLified emails.
> >
> > In brief, though, it gets rid of all modifications to the kernel
> > image all together, so it should fix your issue.
>
> We've already tested it and established it doesn't because you simply
> added your rng data to the end of a different integrity protected file
> which now fails the integrity check instead of the kernel.
>
> I checked the kernel source as well; I thought you'd have done the
> usual thing and bumped the boot protocol version to steal space in
> __pad9, but you didn't apparently.  To fix this up after the fact, I
> recommend that we still steal space in _pad9[] but we make it have
> enough space for a setup_data header as well as the 32 random bytes, so
> we've officially reserved the space, but in earlier kernels than this
> change gets to you can still use the setup_data_offset method, except
> that it now uses the empty space in _pad9 via the setup_data mechanism.
> That should find you space and get you out of having to expand any
> integrity protected files.  The SEV direct boot will still work because
> there's a check further down that doesn't copy the modified header back
> over the kernel because it is ignored on efi stub boot anyway.


Ahh, it looks like there's now an integrity check on the cmdline file. Darn.

The patch in that PULL is still good and necessary and fixed a *different*
bug, though. So we should still move forward on that.

But it sounds like you might now have a concrete suggestion on something
even better. I'm CCing hpa, as this is his wheelhouse, and maybe you two
can divise the next step while I'm away. Maybe the pad9 thing you mentioned
is the super nice solution we've been searching for this whole time. When
I'm home in 10 days and have internet again, I'll take a look at where
thing's are out and try to figure out how I can be productive again with it.

And sorry again for the short HTML emails. A day ago I was using mosh from
my phone to use mutt on a server to reply to emails downloaded from lore.
But today the cloud cover means the best I can do is queue these up in the
Android gmail client and hope they eventually send.

Jason
James Bottomley Feb. 2, 2023, 2:38 p.m. UTC | #13
On Wed, 2023-02-01 at 15:48 -0500, Jason A. Donenfeld wrote:
[...]
> But it sounds like you might now have a concrete suggestion on
> something even better. I'm CCing hpa, as this is his wheelhouse, and
> maybe you two can divise the next step while I'm away. Maybe the pad9
> thing you mentioned is the super nice solution we've been searching
> for this whole time. When I'm home in 10 days and have internet
> again, I'll take a look at where thing's are out and try to figure
> out how I can be productive again with it.

OK, so just FYI HPA, this is the patch I'm thinking of sending to
linux-kernel to reserve space in struct boot_params for this.  If you
could take a look and advise on the location before I send the final
patch, I'd be grateful.  I took space in _pad9 because that's the
standard method (add on to end), but it does strike me we could also
use all of _pad8 for the (the addition is only 48 bytes) or even _pad3
+ hd0_info + hd1_info.

James

---

diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 9338c68e7413..0120ab77dac9 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -308,7 +308,7 @@ _start:
 	# Part 2 of the header, from the old setup.S
 
 		.ascii	"HdrS"		# header signature
-		.word	0x020f		# header version number (>= 0x0105)
+		.word	0x0210		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 		.globl realmode_swtch
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
index 01d19fc22346..c614ff0755f2 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -181,6 +181,19 @@ struct ima_setup_data {
 	__u64 size;
 } __attribute__((packed));
 
+/*
+ * Define a boot_param area for the RNG seed which can be used via the
+ * setup_data mechanism (so must have a setup_data header) but which
+ * is embedded in boot_params because qemu has been unable to find
+ * a safe data space for it.  The value RNG_SEED_LENGTH must not
+ * change (pad length dependent on it) and must match the value in QEMU
+ */
+#define RNG_SEED_LENGTH	32
+struct random_seed_data {
+	struct	setup_data s;
+	__u8	data[RNG_SEED_LENGTH];
+} __attribute__((packed));
+
 /* The so-called "zeropage" */
 struct boot_params {
 	struct screen_info screen_info;			/* 0x000 */
@@ -228,7 +241,8 @@ struct boot_params {
 	struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */
 	__u8  _pad8[48];				/* 0xcd0 */
 	struct edd_info eddbuf[EDDMAXNR];		/* 0xd00 */
-	__u8  _pad9[276];				/* 0xeec */
+	struct random_seed_data random_seed;		/* 0xeec */
+	__u8  _pad9[228];				/* 0xf1c */
 } __attribute__((packed));
 
 /**
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index 6b58610a1552..fb719682579d 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -110,13 +110,10 @@ static int setup_e820_entries(struct boot_params *params)
 	return 0;
 }
 
-enum { RNG_SEED_LENGTH = 32 };
-
 static void
-setup_rng_seed(struct boot_params *params, unsigned long params_load_addr,
-	       unsigned int rng_seed_setup_data_offset)
+setup_rng_seed(struct boot_params *params, unsigned long params_load_addr)
 {
-	struct setup_data *sd = (void *)params + rng_seed_setup_data_offset;
+	struct setup_data *sd = &params->random_seed.s;
 	unsigned long setup_data_phys;
 
 	if (!rng_is_initialized())
@@ -125,7 +122,8 @@ setup_rng_seed(struct boot_params *params, unsigned long params_load_addr,
 	sd->type = SETUP_RNG_SEED;
 	sd->len = RNG_SEED_LENGTH;
 	get_random_bytes(sd->data, RNG_SEED_LENGTH);
-	setup_data_phys = params_load_addr + rng_seed_setup_data_offset;
+	setup_data_phys = params_load_addr + offsetof(struct boot_params,
+						      random_seed);
 	sd->next = params->hdr.setup_data;
 	params->hdr.setup_data = setup_data_phys;
 }
@@ -306,7 +304,7 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params,
 	}
 
 	/* Setup RNG seed */
-	setup_rng_seed(params, params_load_addr, setup_data_offset);
+	setup_rng_seed(params, params_load_addr);
 
 	/* Setup EDD info */
 	memcpy(params->eddbuf, boot_params.eddbuf,
H. Peter Anvin Feb. 2, 2023, 3:03 p.m. UTC | #14
On February 2, 2023 6:38:12 AM PST, James Bottomley <jejb@linux.ibm.com> wrote:
>On Wed, 2023-02-01 at 15:48 -0500, Jason A. Donenfeld wrote:
>[...]
>> But it sounds like you might now have a concrete suggestion on
>> something even better. I'm CCing hpa, as this is his wheelhouse, and
>> maybe you two can divise the next step while I'm away. Maybe the pad9
>> thing you mentioned is the super nice solution we've been searching
>> for this whole time. When I'm home in 10 days and have internet
>> again, I'll take a look at where thing's are out and try to figure
>> out how I can be productive again with it.
>
>OK, so just FYI HPA, this is the patch I'm thinking of sending to
>linux-kernel to reserve space in struct boot_params for this.  If you
>could take a look and advise on the location before I send the final
>patch, I'd be grateful.  I took space in _pad9 because that's the
>standard method (add on to end), but it does strike me we could also
>use all of _pad8 for the (the addition is only 48 bytes) or even _pad3
>+ hd0_info + hd1_info.
>
>James
>
>---
>
>diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
>index 9338c68e7413..0120ab77dac9 100644
>--- a/arch/x86/boot/header.S
>+++ b/arch/x86/boot/header.S
>@@ -308,7 +308,7 @@ _start:
> 	# Part 2 of the header, from the old setup.S
> 
> 		.ascii	"HdrS"		# header signature
>-		.word	0x020f		# header version number (>= 0x0105)
>+		.word	0x0210		# header version number (>= 0x0105)
> 					# or else old loadlin-1.5 will fail)
> 		.globl realmode_swtch
> realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
>diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h
>index 01d19fc22346..c614ff0755f2 100644
>--- a/arch/x86/include/uapi/asm/bootparam.h
>+++ b/arch/x86/include/uapi/asm/bootparam.h
>@@ -181,6 +181,19 @@ struct ima_setup_data {
> 	__u64 size;
> } __attribute__((packed));
> 
>+/*
>+ * Define a boot_param area for the RNG seed which can be used via the
>+ * setup_data mechanism (so must have a setup_data header) but which
>+ * is embedded in boot_params because qemu has been unable to find
>+ * a safe data space for it.  The value RNG_SEED_LENGTH must not
>+ * change (pad length dependent on it) and must match the value in QEMU
>+ */
>+#define RNG_SEED_LENGTH	32
>+struct random_seed_data {
>+	struct	setup_data s;
>+	__u8	data[RNG_SEED_LENGTH];
>+} __attribute__((packed));
>+
> /* The so-called "zeropage" */
> struct boot_params {
> 	struct screen_info screen_info;			/* 0x000 */
>@@ -228,7 +241,8 @@ struct boot_params {
> 	struct boot_e820_entry e820_table[E820_MAX_ENTRIES_ZEROPAGE]; /* 0x2d0 */
> 	__u8  _pad8[48];				/* 0xcd0 */
> 	struct edd_info eddbuf[EDDMAXNR];		/* 0xd00 */
>-	__u8  _pad9[276];				/* 0xeec */
>+	struct random_seed_data random_seed;		/* 0xeec */
>+	__u8  _pad9[228];				/* 0xf1c */
> } __attribute__((packed));
> 
> /**
>diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
>index 6b58610a1552..fb719682579d 100644
>--- a/arch/x86/kernel/kexec-bzimage64.c
>+++ b/arch/x86/kernel/kexec-bzimage64.c
>@@ -110,13 +110,10 @@ static int setup_e820_entries(struct boot_params *params)
> 	return 0;
> }
> 
>-enum { RNG_SEED_LENGTH = 32 };
>-
> static void
>-setup_rng_seed(struct boot_params *params, unsigned long params_load_addr,
>-	       unsigned int rng_seed_setup_data_offset)
>+setup_rng_seed(struct boot_params *params, unsigned long params_load_addr)
> {
>-	struct setup_data *sd = (void *)params + rng_seed_setup_data_offset;
>+	struct setup_data *sd = &params->random_seed.s;
> 	unsigned long setup_data_phys;
> 
> 	if (!rng_is_initialized())
>@@ -125,7 +122,8 @@ setup_rng_seed(struct boot_params *params, unsigned long params_load_addr,
> 	sd->type = SETUP_RNG_SEED;
> 	sd->len = RNG_SEED_LENGTH;
> 	get_random_bytes(sd->data, RNG_SEED_LENGTH);
>-	setup_data_phys = params_load_addr + rng_seed_setup_data_offset;
>+	setup_data_phys = params_load_addr + offsetof(struct boot_params,
>+						      random_seed);
> 	sd->next = params->hdr.setup_data;
> 	params->hdr.setup_data = setup_data_phys;
> }
>@@ -306,7 +304,7 @@ setup_boot_parameters(struct kimage *image, struct boot_params *params,
> 	}
> 
> 	/* Setup RNG seed */
>-	setup_rng_seed(params, params_load_addr, setup_data_offset);
>+	setup_rng_seed(params, params_load_addr);
> 
> 	/* Setup EDD info */
> 	memcpy(params->eddbuf, boot_params.eddbuf,
>

NAK. We need to fix the actual problem of the kernel stomping on memory it shouldn't, not paper around it.
James Bottomley Feb. 2, 2023, 3:17 p.m. UTC | #15
On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote:
[...]
> NAK. We need to fix the actual problem of the kernel stomping on
> memory it shouldn't, not paper around it.

This is a first boot situation, not kexec (I just updated kexec because
it should use any new mechanism we propose).  Unlike kexec, for first
boot we're very constrained by the amount of extra space QEMU has to do
this.  The boot_params are the first page of the kernel load, but the
kernel proper begins directly after it, so we can't expand it.  The two
schemes tried: loading after the kernel and loading after the command
line both tamper with integrity protected files, so we shouldn't use
this mechanism.  This is the essence of the problem: If we add this
area at boot, it has to go in an existing memory location; we can't
steal random guest areas.  All current config parameters are passed
through as fw_config files, so we can only use that mechanism *if* we
know where the area ends up in the loaded kernel *and* the file isn't
integrity protected (this latter is expanding over time).

If we could wind back time, I'd have added the 32 byte random seed to
boot_params properly not coded it as a setup_data addition, but now
we're stuck with coping with existing behaviour, which is why I thought
the retro fit to boot_params would be the better path forward, but if
you have any alternatives, I'm sure we could look at them.

James
H. Peter Anvin Feb. 2, 2023, 6:56 p.m. UTC | #16
On February 2, 2023 7:17:01 AM PST, James Bottomley <jejb@linux.ibm.com> wrote:
>On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote:
>[...]
>> NAK. We need to fix the actual problem of the kernel stomping on
>> memory it shouldn't, not paper around it.
>
>This is a first boot situation, not kexec (I just updated kexec because
>it should use any new mechanism we propose).  Unlike kexec, for first
>boot we're very constrained by the amount of extra space QEMU has to do
>this.  The boot_params are the first page of the kernel load, but the
>kernel proper begins directly after it, so we can't expand it.  The two
>schemes tried: loading after the kernel and loading after the command
>line both tamper with integrity protected files, so we shouldn't use
>this mechanism.  This is the essence of the problem: If we add this
>area at boot, it has to go in an existing memory location; we can't
>steal random guest areas.  All current config parameters are passed
>through as fw_config files, so we can only use that mechanism *if* we
>know where the area ends up in the loaded kernel *and* the file isn't
>integrity protected (this latter is expanding over time).
>
>If we could wind back time, I'd have added the 32 byte random seed to
>boot_params properly not coded it as a setup_data addition, but now
>we're stuck with coping with existing behaviour, which is why I thought
>the retro fit to boot_params would be the better path forward, but if
>you have any alternatives, I'm sure we could look at them.
>
>James
>

The right thing to do is to fix the kernel so that it doesn't stomp on this memory, just as it cannot stomp on boot_params, initrd, or the command line. The kernel boot protocol defines a keep-out area, but physical kaslr violates it and so the kaslr code in the decompressor is responsible for keeping track of the keepout areas, and apparently noone every did.

Adding it to boot_params and bumping the version number is a hack that doesn't solve the backwards compatibility problem, so we should just fix the bug instead. Adding it to boot_params and adding a setup_data pointer MAY be backwards compatible, but it also enables an absolutely catastrophic failure mode: an unaware loader may end up relocating boot_params without knowing that that data has a secondary pointer into it, with the result that we now have a bogus pointer in a linked list. Not good.

Fixing the bug properly is also the only way forward for future setup_data users, and we are running low on space in the fixed-size structures.
H. Peter Anvin Feb. 2, 2023, 7:02 p.m. UTC | #17
On February 2, 2023 7:17:01 AM PST, James Bottomley <jejb@linux.ibm.com> wrote:
>On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote:
>[...]
>> NAK. We need to fix the actual problem of the kernel stomping on
>> memory it shouldn't, not paper around it.
>
>This is a first boot situation, not kexec (I just updated kexec because
>it should use any new mechanism we propose).  Unlike kexec, for first
>boot we're very constrained by the amount of extra space QEMU has to do
>this.  The boot_params are the first page of the kernel load, but the
>kernel proper begins directly after it, so we can't expand it.  The two
>schemes tried: loading after the kernel and loading after the command
>line both tamper with integrity protected files, so we shouldn't use
>this mechanism.  This is the essence of the problem: If we add this
>area at boot, it has to go in an existing memory location; we can't
>steal random guest areas.  All current config parameters are passed
>through as fw_config files, so we can only use that mechanism *if* we
>know where the area ends up in the loaded kernel *and* the file isn't
>integrity protected (this latter is expanding over time).
>
>If we could wind back time, I'd have added the 32 byte random seed to
>boot_params properly not coded it as a setup_data addition, but now
>we're stuck with coping with existing behaviour, which is why I thought
>the retro fit to boot_params would be the better path forward, but if
>you have any alternatives, I'm sure we could look at them.
>
>James
>

Somewhat aside...

There are other issues with passing the entropy seed in memory, of course; in order to avoid accidental or malicious reuse it would be far better if the entropy was actively pulled at use time via virtual I/O, a hypercall, or RDRAND/RDSEED emulation, but I do realize that that is not always an option.
H. Peter Anvin Feb. 2, 2023, 7:13 p.m. UTC | #18
On February 2, 2023 7:17:01 AM PST, James Bottomley <jejb@linux.ibm.com> wrote:
>On Thu, 2023-02-02 at 07:03 -0800, H. Peter Anvin wrote:
>[...]
>> NAK. We need to fix the actual problem of the kernel stomping on
>> memory it shouldn't, not paper around it.
>
>This is a first boot situation, not kexec (I just updated kexec because
>it should use any new mechanism we propose).  Unlike kexec, for first
>boot we're very constrained by the amount of extra space QEMU has to do
>this.  The boot_params are the first page of the kernel load, but the
>kernel proper begins directly after it, so we can't expand it.  The two
>schemes tried: loading after the kernel and loading after the command
>line both tamper with integrity protected files, so we shouldn't use
>this mechanism.  This is the essence of the problem: If we add this
>area at boot, it has to go in an existing memory location; we can't
>steal random guest areas.  All current config parameters are passed
>through as fw_config files, so we can only use that mechanism *if* we
>know where the area ends up in the loaded kernel *and* the file isn't
>integrity protected (this latter is expanding over time).
>
>If we could wind back time, I'd have added the 32 byte random seed to
>boot_params properly not coded it as a setup_data addition, but now
>we're stuck with coping with existing behaviour, which is why I thought
>the retro fit to boot_params would be the better path forward, but if
>you have any alternatives, I'm sure we could look at them.
>
>James
>

One option that you do have that should be backwards compatible, even, is to mark that memory as reserved in the memory map, basically doing the job that the kernel decompressor should have done in the first place. The downside is that existing kennels will never reclaim that memory, but since it is such a small amount, it shouldn't really matter.

We could even reserve a memory type code for it; that way a newer kernel could know to reclaim that memory at the very end of the boot process, when it would deallocate other setup_data entries. Existing kernels will, for obvious reasons, treat unknown memory types as equivalent to type 2 – permanent keep out.
diff mbox series

Patch

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 83c57c6eb1..11e8dd7ca7 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -357,6 +357,7 @@  static void pc_q35_machine_options(MachineClass *m)
     pcmc->default_nic_model = "e1000e";
     pcmc->pci_root_uid = 0;
     pcmc->default_cpu_version = 1;
+    pcmc->legacy_no_rng_seed = true;
 
     m->family = "pc_q35";
     m->desc = "Standard PC (Q35 + ICH9, 2009)";
@@ -394,9 +395,7 @@  DEFINE_Q35_MACHINE(v7_2, "pc-q35-7.2", NULL,
 
 static void pc_q35_7_1_machine_options(MachineClass *m)
 {
-    PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
     pc_q35_7_2_machine_options(m);
-    pcmc->legacy_no_rng_seed = true;
     compat_props_add(m->compat_props, hw_compat_7_1,
hw_compat_7_1_len);
     compat_props_add(m->compat_props, pc_compat_7_1,